Note

0.0(0)

Take a practice test

Chat with Kai

Lecture 5: Item Analysis & Item Response Theory

Lecture 5 Overview

Focus on the Last Steps of Test Development:
1. Test Conceptualisation
2. Test Construction
3. Test Tryout
4. Item Analysis
5. Test Revision
Contents of this Lecture:
- Effectiveness of distractor items
- Item difficulty
- Item discrimination
- Item Response Theory (IRT)

Item Analysis

Developed in the context of ability/achievement testing (primarily for multiple-choice items).

Two General Approaches to Test Construction:

Classical Test Theory (CTT):
- Focus on distractor effectiveness, item difficulty index, and item discriminability.
Item Response Theory (IRT):
- Utilizes parameters like:
- Slope parameter
- Difficulty parameter
- Guessing parameter
- Analyzes items for test bias (differential item functioning).

Effectiveness of Distractors

Definition:
- Distractors are the incorrect answer choices in multiple-choice questions.
Formula for Effectiveness:
- If the number of students selecting a distractor exceeds a calculated threshold (E), it is considered effective.

Where:

N = total number of students who completed the test
n = total number of students who got the item correct
c = number of choices

Example Calculation for Effectiveness of Distractors:

Scenario: 100 students completed a test with 4 options (A, B, C, D).
- Correct answer = C; 45 students got it right. Remaining students:
- 34 selected A
- 17 selected B
- 4 selected D
Criteria Calculation:
- E = (c * N - n)/2 = 6.88
Evaluation of Distractors:
- A: 34 selected (34 > 6.88) = Good
- B: 17 selected (17 > 6.88) = Good
- D: 4 selected (4 < 6.88) = Poor

Item Difficulty Index

Definition: Measures the proportion of people answering an item correctly (item ease index).
- Ranges from 0 (nobody got it right) to 1 (everyone got it right).
- Can also be referred to in various contexts, e.g., item endorsement index.

Optimal Difficulty:

A rule of thumb for optimal item difficulty is between 0.3 to 0.7, although some suggest 0.2 to 0.8.

Item Discriminability

Definition: How well an item discriminates between high and low scorers overall.

Methods:

Extreme Group Method:
- Index (d) calculated as the difference in frequencies of correct responses between upper and lower scorers.
Point Biserial Correlation:
- Correlation between item and total score. Indicates how well an item correlates with overall test performance.

Example: Calculating Discriminability

Extreme Group Method Example:
- Identify upper and lower groups; calculate correct responses.
- Values can range from -1 to 1; higher values indicate better discrimination.

Item Response Theory (IRT)

Definition: Family of mathematical models for designing, analyzing, and scoring tests.
IRT Complexity: More complex than CTT, but has advantages in addressing CTT limitations.
Key Features of IRT:
- Item characteristic curves (ICC) show the relationship between latent traits and probabilities of item endorsement.
- Parameters involved: Difficulty, discrimination, and guessing.

IRT Key Assumptions

Monotonicity: As latent trait increases, the probability of a correct answer also increases.
Unidimensionality: One dominant trait is measured.
Local Independence: Responses are independent given the trait level.
Invariance: Item parameters should remain constant across different groups.

Item Parameters in IRT

For dichotomous items, parameters include:
- Discrimination (how well items differentiate ability levels, values typically range from 0 to 2).
- Difficulty (level at which 50% probability of a correct response occurs, ranges from -3.0 to +3.0).
- Pseudo-guessing (probability of a correct guess).

IRT Advantages

Independence from sample characteristics.
Better handling of guessing.
Adaptable scoring across different items.
Better identification and ranking of items based on ability.
Improved methods for detecting test bias.

Conclusion

Understanding item analysis is crucial for effective test construction and revision.
Knowledge of both CTT and IRT frameworks enables better assessment design and validity.

Note

0.0(0)

Take a practice test

Chat with Kai