1/41
Flashcards covering the essential vocabulary of test construction, including reliability types, validity methods, and statistical measurements from the lecture notes.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Test
A systematic procedure for measuring a sample of an individual's behavior.
Standardized Administration
A systematic procedure where the test developer provides specific guidelines for administering the test.
Systematic Scoring
A systematic procedure where the test developer specifies clear rules or steps for evaluating an examinee's responses.
Validity
The degree to which a test accurately and adequately measures what it was designed to measure.
Reliability
The degree of consistency with which the test measures whatever it is measuring, ensuring an examinee's score is their true score.
True Score
The score a person would get on a test if it were perfectly reliable and performance was not affected by error.
Measurement Error
Random or unpredictable factors irrelevant to the behavior measured that affect examinee scores on a test.
X=T+E
The equation where X is the obtained score, T is the variability due to true scores, and E is the variability due to measurement error.
Reliability Coefficient
Symbolized as rXX or rYY, it is a squared number ranging from 0 to +1 interpreted as the proportion of variability in obtained scores due to true score variability.
Test-Retest Reliability
Evaluated by administering the test to the same group on two occasions; also known as the coefficient of stability.
Alternate Forms Reliability
Evaluated by administering two equivalent forms of the test to the same sample; also known as the coefficient of equivalence.
Internal Consistency Reliability
A measure of how consistent examinees' responses are to different items within a single test administration.
Split-Half Method
A method of assessing internal consistency by dividing a test into two halves and correlating the scores.
Spearman-Brown Prophecy Formula
A formula used to correct the split-half reliability coefficient or to estimate the effect of lengthening or shortening a test on reliability.
Cronbach's Coefficient Alpha
An internal consistency measure conceptualized as the average of all possible split-half reliability coefficients corrected by the Spearman-Brown formula.
KR−20 (Kuder-Richardson Formula 20)
An internal consistency method used specifically when test items are scored dichotomously, such as right or wrong.
Inter-Rater Reliability
The degree of consistency across different raters, assessed when tests are subjectively scored.
Kappa Statistic
A correlation coefficient used to assess inter-rater reliability for two or more raters when the data are nominal.
Coefficient of Concordance
A correlation coefficient used to evaluate inter-rater reliability for two or more raters when data are in the form of ranks.
Standard Error of Measurement (SEM)
The standard deviation of a distribution of obtained scores that would result from testing an individual an infinite number of times; it measures variability due to error.
Confidence Interval
The range within which an examinee's true score is likely to fall given their obtained score, constructed using the standard error of measurement.
Content Validity
The degree to which test items adequately represent the content or behavior domain the test was designed to measure.
Construct Validity
The degree to which a test measures a hypothetical or intangible trait, such as intelligence or self-esteem, evaluated through the accumulation of evidence.
Multi-Trait Multi-Method Matrix
A table of correlation coefficients used to evaluate construct validity by measuring two or more traits using two or more methods.
Convergent Validity
Evidence of construct validity shown when scores on a test correlate highly with scores on measures of the same or similar traits.
Divergent Validity
Evidence of construct validity shown when scores on a test do not correlate highly with scores on measures of unrelated traits; also called discriminant validity.
Monotrait-Monomethod Coefficient
A reliability coefficient in the multi-trait multi-method matrix indicating the correlation of a test with itself.
Monotrait-Heteromethod Coefficient
A correlation between two different measures of the same trait, providing information on convergent validity.
Heterotrait-Monomethod Coefficient
A correlation between two different traits measured by the same method, providing information on divergent validity.
Heterotrait-Heteromethod Coefficient
A correlation between two different traits measured by two different methods, providing information on divergent validity.
Criterion-Related Validity
The degree of association between a predictor (the test) and an external measure of performance (the criterion).
Predictive Validity
A type of criterion-related validity where the test is used to predict an examinee's future performance on the criterion.
Concurrent Validity
A type of criterion-related validity where the test is used to estimate an examinee's current status on the criterion.
Coefficient of Determination
The squared criterion-related validity coefficient (r2) indicating the amount of variability in the criterion explained by the predictor.
Standard Error of Estimate
A value used to construct a confidence interval around an examinee's predicted criterion score.
Incremental Validity
The increase in decision-making accuracy achieved by using a predictor to make selection decisions.
Base Rate
The proportion of people who score above the criterion cutoff without using the new predictor tool.
Positive Hit Rate
The proportion of people who would have been successful if the predictor test had been used as a selection tool.
True Positives
Individuals who scored high on both the predictor and the criterion.
False Positives
Individuals who scored high on the predictor but low on the criterion.
False Negatives
Individuals who scored low on the predictor but high on the criterion.
True Negatives
Individuals who scored low on both the predictor and the criterion.