Rating Scale:
Summative Scale:
Likert Scale:
Method of Paired Comparisons:
Guttman Scale:
Method of Equal-Appearing Intervals (Thurstone Scaling):
Item Bank:
Computerized Adaptive Testing (CAT):
Item Branching:
Proportion of test-takers who answered the item correctly.
p_1 denotes item difficulty for item 1.
A larger index indicates an easier item.
Calculated as number of examinees who correctly answered item 1 / total number of examinees.
Index of difficulty of the average test item: sum of item difficulty indices for all items divided by the total number of items.
Optimal Average Item Difficulty:
Adjusting for Chance Success:
Denoted by d.
Indicates how well an item separates high scorers from low scorers.
Compares performance on an item with performance in the upper and lower regions of a distribution of continuous test scores.
In normal distribution, the upper and lower 27% of scores are used.
In platykurtic distribution, the upper and lower 33% of scores are used.
The higher the value of d, the better the discrimination.
A negative d value indicates that low-scoring examinees are more likely to answer the item correctly.
Calculated as d = (U - L) / n, where:
Interpretation:
Graphic representation of item difficulty and discrimination.
Horizontal axis: ability.
Vertical axis: probability of correct response (PCR).
Item discrimination: slope (steeper slope = greater discrimination).
Item difficulty: skewness (easy item = negative skew, difficult item = positive skew).
Examples:
Speed tests yield misleading/uninterpretable results.
Items near the end of the test may appear more difficult because test-takers may not reach them before time runs out.
May show high item discrimination/positive item-total correlations in late-appearing items due to the select group of examinees reaching those items.
Recommended approach:
Characterize each item by strengths and weaknesses.
Items with many weaknesses are prime candidates for deletion or revision.
Very difficult items may lack reliability and validity.
Test developers may purposefully include some more difficult items on a test that has good items but is somewhat easy.
Revision priorities based on test purpose:
A large item pool facilitates the elimination of poor items.
Poor items can be eliminated in favor of those that were shown in the test tryout to be good items.
After balancing concerns, the revised test is administered to a second sample under standardized conditions.
If item analysis indicates the test is not in finished form, the steps of revision, tryout, and item analysis are repeated until the test is satisfactory and standardization can occur