1/46
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Reliability
Consistency or Dependability of a test
variance
(σ²)
The degree to which scores differ from the mean; it shows score variability.
True variance and error variance.
main sources of test score variability
true variance
Variance caused by actual differences in the trait being measured.
error variance
Variance caused by irrelevant or random factors not related to the construct.
Classical Test Theory
Each person has a true score (T) that would be obtained if there were no errors in measurement.
Error
The portion of the observed score unrelated to the construct being measured.
Types of error
Systematic error and random error.
systematic error
Consistent, predictable error that occurs in the same direction each time. Fixable.
random error
Errors of measurement that occur unpredictably and vary from one measurement to another.
Produces a distribution of scores around the true score.
Large dispersion (far-left distribution)
Single observations might fall far from the true score → less dependable.
Small dispersion (far-right distribution)
Observations are extremely close to the true score → fewer errors, more dependable.
SOURCES OF ERROR VARIANCE
Test Construction
Test Administration
Test Scoring and Administration
Surveys and Polls as Assessment tools
SYSTEMATIC AND NONSYSTEMATIC ERRORS IN SENSITIVE ASSESSMENTS
test–retest reliability
Administer the same test twice to the same group, then correlate the scores.
Stability of test scores over time
For stable traits (e.g., intelligence, personality traits).
alternate-forms reliability
Consistency of scores between two equivalent versions of the same test.
parallel forms
Test versions with equal means, variances, and difficulty.
alternate forms
Similar but not identical versions of a test that measure the same construct.
item-sampling error
Score differences caused by different items in each form, not by true ability.
split-half reliability
Reliability estimated by correlating scores from two halves of a single test.
Spearman–Brown formula
It allows a test developer or user to estimate internal consistency reliability based on test length.
Whether adding or removing items will strengthen or weaken overall test reliability.
Reliability usually increases, but only if the new items are equivalent in content and difficulty.
What happens when tests are lengthened by adding items?
inter-item consistency
How strongly the items within a test correlate with one another.
homogeneous test
All items measure a single trait or factor.
Heterogenous test
measures multiple traits or factors.
Kuder–Richardson Formulas
Internal consistency reliability for dichotomous items (scored 0/1, e.g., true/false, right/wrong).
KR-21
A simplified version of KR-20 that assumes all items have equal difficulty.
less accurate but easier to compute.
KR-20
For tests with items scored as correct/incorrect (like multiple-choice or true/false tests)
Cronbach’s Alpha (α)
The average correlation among all test items — a measure of internal consistency.
Cronbach’s Alpha
It can be used for items with multiple scoring formats (not just dichotomous), such as Likert scales.
Excellent
≥ 0.90
May be problematic (depends on purpose)
< 0.70
very high α ( > 0.95)
Item redundancy — items may be too similar and not add new information.
Internal consistency methods (e.g., KR-20, Cronbach’s alpha).
What kind of reliability is best for homogeneous tests?
Test–retest reliability (since internal consistency may not apply).
Which reliability method suits heterogeneous tests better?
dynamic characteristics
Traits, states, or abilities that change over time (e.g., mood, anxiety).
Internal consistency methods
What reliability methods are better for dynamic characteristics?
Static Characteristics
Traits or abilities that remain relatively stable (e.g., intelligence)
Test–retest or alternate-forms reliability
What methods are suitable for static characteristics?
Restriction of range
When the sample used has limited variability (e.g., only high scorers).
Inflation of range
Artificially increased variability in scores, which may overestimate reliability.
Power test
Tests with generous time limits and varying difficulty levels; scores reflect ability, not speed.
Speed test
Tests with easy items but strict time limits; scores reflect speed, not ability.
Internal consistency methods (e.g., KR-20, Cronbach’s alpha).
What reliability methods are best for power tests?
Test–retest, alternate-forms, or split-half (with timed halves adjusted using Spearman–Brown).
What reliability methods are suitable for speed tests?
Criterion-Referenced test
Tests that measure mastery of specific skills or objectives (e.g., pass/fail).
Norm-Referenced Tests
Tests that compare an individual’s score to others’.
Traditional ones like test–retest, split-half, and KR-20.
Which reliability methods suit norm-referenced tests?