Item Analysis and Test Reliability

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/56

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

57 Terms

1
New cards

Measurement Error

Variability in scores due to random factors

2
New cards

Examples of Measurement Error

ambiguous items, fatigue, or distractions that affect performance unpredictably.

3
New cards

Reliability Coefficient

indicates the proportion of true score variability in a test’s scores, ranging from 0 to 1.

4
New cards

Minimally Acceptable Reliability Coefficient

0.70

5
New cards

Minimally Acceptable Reliability Coefficient for high stakes test

0.90 or higher is required

6
New cards

Alternate Forms Reliability Evaluates

the consistency of scores between two equivalent forms of the test

7
New cards

Alternate Forms Reliability is useful for

when tests have multiple versions

8
New cards

What does Internal Consistency Reliability measure

the consistency of scores across different test items

9
New cards

When is internal consistency reliability useful?

tests measuring a single content domain

10
New cards

Coefficient Alpha

A measure of internal consistency reliability that calculates the average inter-item correlation

11
New cards

What kind of data is used for coefficient alpha?

continuous test items

12
New cards

Kuder-Richardson 20 (KR-20)

used for tests with dichotomous items (e.g., correct/incorrect).

13
New cards

Split-Half Reliability

Splits a test into two halves (e.g., even and odd items) and correlates scores on both halves

14
New cards

What corrects the split-half reliability?

Spearman-Brown formula

15
New cards

What does Inter-Rater Reliability assess?

the consistency of scores assigned by different raters.

16
New cards

What is inter-rater reliability used for?

Important for subjectively scored measures like essays or interviews

17
New cards

What does Cohen’s Kappa Coefficient correct for?

the chance agreement between raters

18
New cards

What is cohen’s kappa coefficient used for?

when ratings represent unranked categories (nominal scale)

19
New cards

Consensual Observer Drift

when raters communicate while assigning ratings

20
New cards

What is the effect of consensual observer drift?

increasing consistency but reducing accuracy.

21
New cards

Homogeneity content

tend to have higher reliability coefficients

22
New cards

heterogenous content

lower reliability content

23
New cards

unrestricted range

Reliability coefficients are larger

24
New cards

restricted range causes

reliability coefficients are smaller

25
New cards

The easier it is to guess an answer on a test

the lower the test’s reliability

26
New cards

True or false tests

less reliable regarding guessing and reliability

27
New cards

Multiple-choice tests

more reliable regarding guessing and reliability

28
New cards

Reliability Index

correlation between observed scores and true scores

29
New cards

Calculating the reliability index

taking the square root of the reliability coefficient

30
New cards

Item Analysis

process to determine which items to include in a test by analyzing item difficulty and item discrimination.

31
New cards

Item Difficulty

percentage of examinees who answered an item correctly

32
New cards

Moderately difficult items

(p = .30 to .70)

33
New cards

What is the preferred item difficulty?

moderately difficult items (p = .30 to .70)

34
New cards

Item Discrimination

ability of an item to differentiate between examinees with high and low scores.

35
New cards

Discrimination Index Range

-1.0 to +1.0.

36
New cards

Definition of Standard Error of Measurement (SEM)

how much an obtained score is expected to differ from the true score.

37
New cards

What is Standard Error of Measurement (SEM) used for?

construct confidence intervals

38
New cards

Confidence Intervals

Ranges around a test score that indicate where the true score likely lies

39
New cards

68% CI

±1 SEM

40
New cards

95% CI.

±2 SEM

41
New cards

99% CI

±3 SEM

42
New cards

Item Response Theory (IRT)

focusing on examinee responses to individual items to design tests tailored to specific traits and populations

43
New cards

Item Characteristic Curve (ICC)

graph that shows the probability of answering an item correctly based on the examinee’s trait level

44
New cards

Item Characteristic Curve (ICC) x axis

examinee’s trait level

45
New cards

Item Characteristic Curve (ICC) y axis

probability of answering an item correctly

46
New cards

x axis

horizontal

47
New cards

y axis

vertical

48
New cards

Difficulty Parameter

the level of the trait needed for a 50% probability of answering an item correctly

49
New cards

Probability of Guessing

the point where the curve crosses the Y-axis

50
New cards

Lower values of the Probability of Guessing

harder to guess correctly

51
New cards

Classical Test Theory

Based on true score theory: X (Obtained Score) = T (True Score Variability) + E (Measurement Error).

52
New cards

Test-Retest Reliability definition

Consistency of scores over time, administered twice, correlate scores

53
New cards

What is test-retest reliability useful for?

Useful for stable traits

54
New cards

Internal Consistency Reliability

Consistency across test item

55
New cards

Internal Consistency Reliability is useful for

measuring a single content domain

56
New cards

Factors Affecting Reliability

Content Homogeneity, Range of Scores, Guessing

57
New cards

Item Difficulty (p)

p = correct responses divided by total responses