Note
0.0(0)

Lecture 5: Item Analysis & Item Response Theory

Lecture 5 Overview

  • Focus on the Last Steps of Test Development:

    1. Test Conceptualisation
    2. Test Construction
    3. Test Tryout
    4. Item Analysis
    5. Test Revision
  • Contents of this Lecture:

    • Effectiveness of distractor items
    • Item difficulty
    • Item discrimination
    • Item Response Theory (IRT)

Item Analysis

  • Developed in the context of ability/achievement testing (primarily for multiple-choice items).

Two General Approaches to Test Construction:

  • Classical Test Theory (CTT):

    • Focus on distractor effectiveness, item difficulty index, and item discriminability.
  • Item Response Theory (IRT):

    • Utilizes parameters like:
    • Slope parameter
    • Difficulty parameter
    • Guessing parameter
    • Analyzes items for test bias (differential item functioning).

Effectiveness of Distractors

  • Definition:

    • Distractors are the incorrect answer choices in multiple-choice questions.
  • Formula for Effectiveness:

    • If the number of students selecting a distractor exceeds a calculated threshold (E), it is considered effective.

Where:

  • N = total number of students who completed the test
  • n = total number of students who got the item correct
  • c = number of choices

Example Calculation for Effectiveness of Distractors:

  • Scenario: 100 students completed a test with 4 options (A, B, C, D).

    • Correct answer = C; 45 students got it right. Remaining students:
    • 34 selected A
    • 17 selected B
    • 4 selected D
  • Criteria Calculation:

    • E = (c * N - n)/2 = 6.88
  • Evaluation of Distractors:

    • A: 34 selected (34 > 6.88) = Good
    • B: 17 selected (17 > 6.88) = Good
    • D: 4 selected (4 < 6.88) = Poor

Item Difficulty Index

  • Definition: Measures the proportion of people answering an item correctly (item ease index).
    • Ranges from 0 (nobody got it right) to 1 (everyone got it right).
    • Can also be referred to in various contexts, e.g., item endorsement index.

Optimal Difficulty:

  • A rule of thumb for optimal item difficulty is between 0.3 to 0.7, although some suggest 0.2 to 0.8.

Item Discriminability

  • Definition: How well an item discriminates between high and low scorers overall.

Methods:

  1. Extreme Group Method:
    • Index (d) calculated as the difference in frequencies of correct responses between upper and lower scorers.
  2. Point Biserial Correlation:
    • Correlation between item and total score. Indicates how well an item correlates with overall test performance.

Example: Calculating Discriminability

  • Extreme Group Method Example:
    • Identify upper and lower groups; calculate correct responses.
    • Values can range from -1 to 1; higher values indicate better discrimination.

Item Response Theory (IRT)

  • Definition: Family of mathematical models for designing, analyzing, and scoring tests.
  • IRT Complexity: More complex than CTT, but has advantages in addressing CTT limitations.
  • Key Features of IRT:
    • Item characteristic curves (ICC) show the relationship between latent traits and probabilities of item endorsement.
    • Parameters involved: Difficulty, discrimination, and guessing.

IRT Key Assumptions

  1. Monotonicity: As latent trait increases, the probability of a correct answer also increases.
  2. Unidimensionality: One dominant trait is measured.
  3. Local Independence: Responses are independent given the trait level.
  4. Invariance: Item parameters should remain constant across different groups.

Item Parameters in IRT

  • For dichotomous items, parameters include:
    • Discrimination (how well items differentiate ability levels, values typically range from 0 to 2).
    • Difficulty (level at which 50% probability of a correct response occurs, ranges from -3.0 to +3.0).
    • Pseudo-guessing (probability of a correct guess).

IRT Advantages

  • Independence from sample characteristics.
  • Better handling of guessing.
  • Adaptable scoring across different items.
  • Better identification and ranking of items based on ability.
  • Improved methods for detecting test bias.

Conclusion

  • Understanding item analysis is crucial for effective test construction and revision.
  • Knowledge of both CTT and IRT frameworks enables better assessment design and validity.
Note
0.0(0)