Looks like no one added any tags here yet for you.
Measurement Error
Variability in scores due to random factors
Examples of Measurement Error
ambiguous items, fatigue, or distractions that affect performance unpredictably.
Reliability Coefficient
indicates the proportion of true score variability in a test’s scores, ranging from 0 to 1.
Minimally Acceptable Reliability Coefficient
0.70
Minimally Acceptable Reliability Coefficient for high stakes test
0.90 or higher is required
Alternate Forms Reliability Evaluates
the consistency of scores between two equivalent forms of the test
Alternate Forms Reliability is useful for
when tests have multiple versions
What does Internal Consistency Reliability measure
the consistency of scores across different test items
When is internal consistency reliability useful?
tests measuring a single content domain
Coefficient Alpha
A measure of internal consistency reliability that calculates the average inter-item correlation
What kind of data is used for coefficient alpha?
continuous test items
Kuder-Richardson 20 (KR-20)
used for tests with dichotomous items (e.g., correct/incorrect).
Split-Half Reliability
Splits a test into two halves (e.g., even and odd items) and correlates scores on both halves
What corrects the split-half reliability?
Spearman-Brown formula
What does Inter-Rater Reliability assess?
the consistency of scores assigned by different raters.
What is inter-rater reliability used for?
Important for subjectively scored measures like essays or interviews
What does Cohen’s Kappa Coefficient correct for?
the chance agreement between raters
What is cohen’s kappa coefficient used for?
when ratings represent unranked categories (nominal scale)
Consensual Observer Drift
when raters communicate while assigning ratings
What is the effect of consensual observer drift?
increasing consistency but reducing accuracy.
Homogeneity content
tend to have higher reliability coefficients
heterogenous content
lower reliability content
unrestricted range
Reliability coefficients are larger
restricted range causes
reliability coefficients are smaller
The easier it is to guess an answer on a test
the lower the test’s reliability
True or false tests
less reliable regarding guessing and reliability
Multiple-choice tests
more reliable regarding guessing and reliability
Reliability Index
correlation between observed scores and true scores
Calculating the reliability index
taking the square root of the reliability coefficient
Item Analysis
process to determine which items to include in a test by analyzing item difficulty and item discrimination.
Item Difficulty
percentage of examinees who answered an item correctly
Moderately difficult items
(p = .30 to .70)
What is the preferred item difficulty?
moderately difficult items (p = .30 to .70)
Item Discrimination
ability of an item to differentiate between examinees with high and low scores.
Discrimination Index Range
-1.0 to +1.0.
Definition of Standard Error of Measurement (SEM)
how much an obtained score is expected to differ from the true score.
What is Standard Error of Measurement (SEM) used for?
construct confidence intervals
Confidence Intervals
Ranges around a test score that indicate where the true score likely lies
68% CI
±1 SEM
95% CI.
±2 SEM
99% CI
±3 SEM
Item Response Theory (IRT)
focusing on examinee responses to individual items to design tests tailored to specific traits and populations
Item Characteristic Curve (ICC)
graph that shows the probability of answering an item correctly based on the examinee’s trait level
Item Characteristic Curve (ICC) x axis
examinee’s trait level
Item Characteristic Curve (ICC) y axis
probability of answering an item correctly
x axis
horizontal
y axis
vertical
Difficulty Parameter
the level of the trait needed for a 50% probability of answering an item correctly
Probability of Guessing
the point where the curve crosses the Y-axis
Lower values of the Probability of Guessing
harder to guess correctly
Classical Test Theory
Based on true score theory: X (Obtained Score) = T (True Score Variability) + E (Measurement Error).
Test-Retest Reliability definition
Consistency of scores over time, administered twice, correlate scores
What is test-retest reliability useful for?
Useful for stable traits
Internal Consistency Reliability
Consistency across test item
Internal Consistency Reliability is useful for
measuring a single content domain
Factors Affecting Reliability
Content Homogeneity, Range of Scores, Guessing
Item Difficulty (p)
p = correct responses divided by total responses