1/36
A comprehensive vocabulary review of psychometric reliability concepts, measurement error types, reliability estimation methods, and statistical measures of error precision based on Chapter 5 lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai | Chat |
|---|
No analytics yet
Send a link to your students to track their progress
Reliability
In the psychometric sense, this refers to consistency in measurement; specifically, how consistently and accurately a psychological test measures what it purports to measure.
Classical Test Theory (CTT) Model
The framework assuming that an individual's score on a test is composed of a true component and an error component, represented by the formula: X=T+E.
Observed Score (X)
The actual, raw score earned by a testtaker on a given instrument, such as getting 45 out of 50 questions correct.
True Score (T)
A theoretical value representing the actual, genuine amount of an attribute possessed by the testtaker, completely free of any measurement error.
Error Score (E)
The component of the observed score attributed to irrelevant, random, or extraneous factors that have nothing to do with the actual construct being measured.
Reliability Coefficient
The proportion of total variance in test scores that is attributed to true variance, typically ranging from 0 to 1.00.
Variance (s2)
The standard deviation squared, serving as a crucial index of test score variability and describing how much individual scores spread out from the arithmetic mean.
True Variance (str2â)
Variations in test scores resulting from real, authentic, and genuine differences among testtakers regarding the attribute or construct being measured.
Error Variance (se2â)
Variations in test scores resulting from irrelevant, chance, or random sources that contaminate the measurement process, represented by the formula s2=str2â+se2â.
Random Error
Error caused by unpredictable, transient fluctuations, such as sudden external noise or a temporary drop in attention, that affect testtakers uniquely and unsystematically.
Systematic Error
A source of error that is predictable, constant, and fixed, affecting all scores uniformly and thus not changing the variability or reliability coefficient.
Item Sampling / Content Sampling
The variation in scores occurring because of the specific items chosen for inclusion in a test compared to the entire universe or domain of potential content.
Test-Retest Reliability
An estimate obtained by administering the exact same measurement instrument to the same sample of individuals at two distinct points in time.
Coefficient of Stability
The correlation coefficient obtained when the time interval between two test-retest administrations is greater than 6 months.
Parallel Forms
Versions of a test where the operational means and variances of observed test scores are theoretically identical.
Alternate Forms
Different versions of a test designed to be equivalent in content coverage and difficulty but containing entirely distinct, non-overlapping items.
Coefficient of Equivalence
The correlation between the scores on two forms of a test, reflecting how equivalent the two item samples are.
Split-Half Reliability
An internal consistency estimate obtained by administering a test once and splitting the items into two equal halves to calculate a correlation coefficient.
Odd-Even Reliability
A method of creating test halves by assigning odd-numbered items to one half and even-numbered items to the other.
Spearman-Brown Formula
A formula used to estimate the internal consistency reliability of a lengthened or shortened test: rSBâ=1+(nâ1)rxyânrxyââ.
Inter-Item Consistency
The degree of correlation and consistency among all individual items on a scale, requiring only a single administration.
Homogeneity
The degree to which individual items on a test measure a single, unifactorial trait or construct, resulting in items that are tightly inter-correlated.
Heterogeneity
Describes a multi-construct test or test battery where different subscales deliberately measure completely different, independent traits.
Kuder-Richardson Reliability (KR-20)
A formula (r_{kr-20} = (\frac{k}{k-1})(1 - \frac{\text{â}pq}{\text{Ï}^2})) used for highly homogeneous tests with strictly dichotomous scoring (right/wrong).
Coefficient Alpha (Îą)
The mean of all possible split-half correlations corrected by the Spearman-Brown formula, designed for non-dichotomous items like Likert scales.
Redundancy Myth
The misconception that a Coefficient Alpha above .90 is always better, when it actually indicates unnecessary, repetitive items asking the same narrow question.
Inter-Scorer Reliability
The degree of consistency, consensus, and agreement between two or more independent scorers, raters, or observers.
Dynamic Characteristics
Psychological traits, states, or processes that are fluid and shift rapidly in response to situational factors, such as state anxiety.
Static Characteristics
Psychological traits that are deeply embedded, highly durable, and do not fluctuate rapidly over time, such as core personality traits.
Restricted Variance
Occurs when a sample is highly uniform, creating a narrow range of scores that mathematically suppresses the correlation coefficient and deflates test reliability.
Inflated Variance
Occurs when a sample is highly diverse, creating an exceptionally wide range of scores that artificially boosts the reliability index.
Power Test
A test containing items arranged in increasing difficulty with generous time limits so testtakers can attempt every item.
Speed Test
A test containing uniform, easy items with a strict time limit that makes it impossible for any testtaker to finish.
Criterion-Referenced Test
A test where performance is compared directly against an absolute, pre-established standard or mastery level rather than a normative peer group.
Standard Error of Measurement (SEM)
A diagnostic tool representing the standard deviation of a theoretically normal distribution of test scores for one individual: Ïmeasâ=Ïâ(1ârxxâ).
Confidence Interval
A precise band or range of scores, calculated using the SEM, that is statistically likely to contain the testtaker's true psychological score.
Standard Error of the Difference (Ïdiffâ)
A statistical measure used to evaluate the true difference between two distinct scores to determine if the difference is statistically significant.