1/57
Vocabulary flashcards covering key reliability concepts, statistics, error sources, measurement models, and related research issues from Chapter 5 of the lecture notes. These cards aid in mastering terminology essential for understanding and applying reliability theory in psychological testing.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reliability (Psychometrics)
The consistency of measurement results; the proportion of total score variance that is true variance.
Reliability Coefficient
A statistic (0–1) that quantifies reliability; higher values indicate greater consistency.
Measurement Error
The inherent uncertainty in any measurement, comprising preventable mistakes and inevitable imprecision.
True Score (T)
The long-term average score that would be obtained with infinite error-free measurements on the same instrument.
Construct Score
A person’s standing on the theoretical attribute being measured, independent of any specific test.
Observed Score (X)
The actual obtained test score, composed of true score plus error (X = T + E).
Error Score (E)
The component of an observed score attributable to measurement error.
True Variance
Portion of score variance due to real differences among individuals on the measured attribute.
Error Variance
Portion of score variance produced by random or systematic measurement error.
Systematic Error
Consistent, directional measurement error that inflates or deflates scores (produces bias).
Random Error
Unpredictable, nonsystematic fluctuations that raise or lower scores without pattern.
Bias (Statistics)
The degree to which systematic error distorts measurement results.
Carryover Effects
Influences of prior testing on later scores (e.g., practice and fatigue effects).
Practice Effects
Performance gains on a test due to familiarity from earlier administrations.
Fatigue Effects
Performance declines on a test due to reduced energy or motivation from repeated testing.
Item / Content Sampling
Error source stemming from the specific items chosen for a test or their wording.
Sources of Error Variance
Test construction, administration, scoring, interpretation, and examinee-related factors.
Test-Retest Reliability
Consistency of scores when the same test is administered to the same group on two occasions.
Coefficient of Stability
Test-retest reliability obtained with a retest interval of six months or more.
Alternate-Forms Reliability
Correlation of scores between two different, but equivalent, versions of a test.
Parallel-Forms Reliability
Reliability of two test forms with equal means and variances; aka coefficient of equivalence.
Split-Half Reliability
Internal consistency estimate from correlating scores on two halves of one test administration.
Odd-Even Reliability
Split-half reliability obtained by correlating odd- and even-numbered items.
Spearman–Brown Formula
Equation used to adjust split-half reliability or predict reliability after changing test length.
Internal Consistency
The degree to which test items measure the same construct; assessed via split-half, KR-20, or alpha.
Coefficient Alpha (Cronbach’s α)
Average of all possible split-half correlations; common index of internal consistency.
Kuder–Richardson Formulas
KR-20/KR-21 coefficients for internal consistency of dichotomous items.
McDonald’s Omega
Reliability coefficient that accommodates unequal item loadings, often outperforming alpha.
Inter-Scorer (Inter-Rater) Reliability
Agreement among independent scorers, raters, or judges on a measurement.
Coefficient of Inter-Scorer Reliability
Correlation (e.g., Pearson r or κ) expressing scorer agreement.
Dynamic Characteristic
A trait or state expected to change over time (e.g., mood), complicating test-retest reliability.
Static Characteristic
Relatively stable attribute (e.g., intelligence) suitable for test-retest reliability assessment.
Homogeneous Test
Instrument whose items all tap a single construct; should show high internal consistency.
Heterogeneous Test
Instrument measuring multiple constructs; may show lower internal consistency but still be reliable overall.
Restriction of Range
Reduced variability of scores, which lowers observed correlations and reliability estimates.
Speed Test
Test with easy items and strict time limit; score differences reflect speed, not item difficulty.
Power Test
Test with sufficient time and items varying in difficulty; score differences reflect knowledge/ability.
Criterion-Referenced Test
Assessment interpreted against predefined mastery criteria, not relative to norms; traditional reliability indices may mislead.
Classical Test Theory (CTT)
Traditional measurement model defining observed score as true score plus error.
Domain Sampling Theory
Model viewing a test as a sample from a universe of possible items; precursor to generalizability theory.
Generalizability Theory
Extension of domain sampling that estimates multiple error sources (facets) and produces generalizability coefficients.
Universe Score
Analog to true score within generalizability theory for a specified set of measurement conditions.
Item Response Theory (IRT)
Family of models describing probability of a response as a function of latent trait level and item parameters.
Latent-Trait Theory
Another term for IRT emphasizing underlying unobservable traits.
Rasch Model
Specific IRT model assuming equal item discrimination and logistic item characteristic curves.
Item Difficulty (IRT)
Trait level at which a respondent has 50 % chance of endorsing an item correctly/positively.
Item Discrimination (IRT)
Degree to which an item differentiates among individuals at different trait levels.
Standard Error of Measurement (SEM)
Estimated standard deviation of an individual’s observed scores around their true score; SEM = σ√(1 − r).
Confidence Interval (Score)
Range around an observed score within which the true score is expected to fall with specified probability.
Standard Error of the Difference
Statistic used to determine whether two test scores differ significantly, incorporating each score’s SEM.
Coefficient of Generalizability
Reliability-like index from generalizability studies indicating dependability across specified facets.
Coefficient of Equivalence
Alternate- or parallel-forms reliability coefficient denoting score consistency across forms.
Replicability Crisis
Widespread concern that many published psychological findings fail to replicate in independent studies.
Questionable Research Practices (QRPs)
Methodological shortcuts (e.g., data peeking, selective reporting) that inflate false-positive results.
Preregistration
Publicly recording study hypotheses and methods before data collection to curb QRPs and enhance replicability.
Differential Item Functioning (DIF)
Item bias where individuals from different groups with equal ability have different probabilities of endorsing an item.
Coefficient of Stability
Alternate term for long-interval test-retest reliability coefficient.
Coefficient of Dependability
Generalizability-theory index expressing score consistency for a specific decision purpose.