1/27
Flashcards covering key concepts of reliability in psychological assessment, including types of error, reliability estimates (test-retest, parallel/alternate forms, split-half, inter-item consistency, inter-scorer), and measurement models (Classical Test Theory, Domain Sampling Theory, Generalizability Theory, Item Response Theory), along with the Standard Error of Measurement.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Error
The component of the observed test score that does not have to do with the test taker's ability (X = T + E).
Measurement error
All of the factors associated with the process of measuring some variable.
Random error
Source of error in measuring a targeted variable caused by unpredictable fluctuations and inconsistencies of other variables in the measurement process.
Systematic error
Source of error in measuring a variable that is typically constant or proportionate to what is presumed to be the true value of the variable being measured; once known, it can be predicted.
Item sampling (content sampling)
Variation among items within a test, identified as a source of error in test construction.
Test-Retest Reliability
Estimate of reliability obtained by correlating pairs of scores from the same people on two different administrations of the same test.
Coefficient of stability
An estimate of test-retest reliability.
Parallel-Forms Reliability
A type of reliability estimated by the degree of relationship between various forms of a test, where for each form, the means and variances of observed test scores are equal.
Alternate-Forms Reliability
A type of reliability estimated by the degree of relationship between two different forms of a test that are similar in difficulty.
Coefficient of equivalence
The coefficient of reliability used to evaluate the relationship between alternate or parallel forms of a test.
Split-Half Reliability
Estimates reliability by correlating two pairs of scores obtained from equivalent halves of a single test administered once.
Spearman-Brown formula
Used to adjust split-half reliability estimates.
Odd-even technique
An acceptable way to split a test into equivalent halves for split-half reliability estimation by using odd-numbered items for one half and even-numbered items for the other.
Inter-item consistency
The degree of correlation among all the items on a scale, calculated from a single administration of a single test form.
Homogeneous test
A test whose items measure a single trait; the more homogeneous a test is, the more inter-item consistency it can be expected to have.
Kuder-Richardson formulas
Formulas used for determining the inter-item consistency of dichotomous items, typically those scored right or wrong.
KR21
A Kuder-Richardson formula that may be used if there is reason to assume that all test items have approximately the same degree of difficulty.
Coefficient alpha (Cronbach's alpha)
The mean of all possible split-half correlations, corrected by the Spearman-Brown formula; it is appropriate for nondichotomous items and is the preferred statistic for estimating internal consistency reliability.
Inter-Scorer Reliability (Scorer reliability, Judge reliability, Observer reliability, Inter-rater reliability)
The degree of agreement or consistency between two or more scorers (or judges or raters) with regard to a particular measure.
Coefficient of inter-scorer reliability
A coefficient used to quantify the degree of agreement or consistency between two or more scorers.
Classical Test Theory (CTT) (True score model)
A measurement model where an observed score is conceptualized as comprising a true score and error, and all test items are presumed to contribute equally to the score total.
True score
A value that, according to classical test theory, genuinely reflects an individual's ability (or trait) level as measured by a particular test.
Domain sampling theory
A theory that seeks to estimate the extent to which specific sources of variation under defined conditions contribute to the test score, conceiving test reliability as an objective measure of how precisely the test score assesses the domain from which it samples.
Generalizability theory
A theory suggesting a person's test scores vary from testing to testing because of variables in the testing situation, describing the 'universe' of testing in terms of its 'facets'.
Facets (Generalizability theory)
Components of the testing situation or 'universe' in generalizability theory, such as the number of items, amount of scorer training, and purpose of administration.
Item response theory (IRT)
A measurement theory that models the probability that a person with a particular ability or trait level will successfully respond to a test item, focusing on item difficulty and item discrimination.
Standard Error of Measurement (SEM)
A measure of the precision of an observed test score, which has an inverse relationship with the reliability of a test and is frequently used in interpreting individual scores.
Confidence interval
A range or band of test scores that is likely to contain the true score.