Lecture 5: Item Analysis & Item Response Theory
Lecture 5 Overview
Item Analysis
- Developed in the context of ability/achievement testing (primarily for multiple-choice items).
Two General Approaches to Test Construction:
Effectiveness of Distractors
Where:
- N = total number of students who completed the test
- n = total number of students who got the item correct
- c = number of choices
Example Calculation for Effectiveness of Distractors:
Scenario: 100 students completed a test with 4 options (A, B, C, D).
- Correct answer = C; 45 students got it right. Remaining students:
- 34 selected A
- 17 selected B
- 4 selected D
Criteria Calculation:
Evaluation of Distractors:
- A: 34 selected (34 > 6.88) = Good
- B: 17 selected (17 > 6.88) = Good
- D: 4 selected (4 < 6.88) = Poor
Item Difficulty Index
- Definition: Measures the proportion of people answering an item correctly (item ease index).
- Ranges from 0 (nobody got it right) to 1 (everyone got it right).
- Can also be referred to in various contexts, e.g., item endorsement index.
Optimal Difficulty:
- A rule of thumb for optimal item difficulty is between 0.3 to 0.7, although some suggest 0.2 to 0.8.
Item Discriminability
- Definition: How well an item discriminates between high and low scorers overall.
Methods:
- Extreme Group Method:
- Index (d) calculated as the difference in frequencies of correct responses between upper and lower scorers.
- Point Biserial Correlation:
- Correlation between item and total score. Indicates how well an item correlates with overall test performance.
Example: Calculating Discriminability
- Extreme Group Method Example:
- Identify upper and lower groups; calculate correct responses.
- Values can range from -1 to 1; higher values indicate better discrimination.
Item Response Theory (IRT)
- Definition: Family of mathematical models for designing, analyzing, and scoring tests.
- IRT Complexity: More complex than CTT, but has advantages in addressing CTT limitations.
- Key Features of IRT:
- Item characteristic curves (ICC) show the relationship between latent traits and probabilities of item endorsement.
- Parameters involved: Difficulty, discrimination, and guessing.
IRT Key Assumptions
- Monotonicity: As latent trait increases, the probability of a correct answer also increases.
- Unidimensionality: One dominant trait is measured.
- Local Independence: Responses are independent given the trait level.
- Invariance: Item parameters should remain constant across different groups.
Item Parameters in IRT
- For dichotomous items, parameters include:
- Discrimination (how well items differentiate ability levels, values typically range from 0 to 2).
- Difficulty (level at which 50% probability of a correct response occurs, ranges from -3.0 to +3.0).
- Pseudo-guessing (probability of a correct guess).
IRT Advantages
- Independence from sample characteristics.
- Better handling of guessing.
- Adaptable scoring across different items.
- Better identification and ranking of items based on ability.
- Improved methods for detecting test bias.
Conclusion
- Understanding item analysis is crucial for effective test construction and revision.
- Knowledge of both CTT and IRT frameworks enables better assessment design and validity.