Item Analysis and Test Reliability

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions
Get a hint
Hint

Measurement Error

1 / 56

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

57 Terms

1

Measurement Error

Variability in scores due to random factors

New cards
2

Examples of Measurement Error

ambiguous items, fatigue, or distractions that affect performance unpredictably.

New cards
3

Reliability Coefficient

indicates the proportion of true score variability in a test’s scores, ranging from 0 to 1.

New cards
4

Minimally Acceptable Reliability Coefficient

0.70

New cards
5

Minimally Acceptable Reliability Coefficient for high stakes test

0.90 or higher is required

New cards
6

Alternate Forms Reliability Evaluates

the consistency of scores between two equivalent forms of the test

New cards
7

Alternate Forms Reliability is useful for

when tests have multiple versions

New cards
8

What does Internal Consistency Reliability measure

the consistency of scores across different test items

New cards
9

When is internal consistency reliability useful?

tests measuring a single content domain

New cards
10

Coefficient Alpha

A measure of internal consistency reliability that calculates the average inter-item correlation

New cards
11

What kind of data is used for coefficient alpha?

continuous test items

New cards
12

Kuder-Richardson 20 (KR-20)

used for tests with dichotomous items (e.g., correct/incorrect).

New cards
13

Split-Half Reliability

Splits a test into two halves (e.g., even and odd items) and correlates scores on both halves

New cards
14

What corrects the split-half reliability?

Spearman-Brown formula

New cards
15

What does Inter-Rater Reliability assess?

the consistency of scores assigned by different raters.

New cards
16

What is inter-rater reliability used for?

Important for subjectively scored measures like essays or interviews

New cards
17

What does Cohen’s Kappa Coefficient correct for?

the chance agreement between raters

New cards
18

What is cohen’s kappa coefficient used for?

when ratings represent unranked categories (nominal scale)

New cards
19

Consensual Observer Drift

when raters communicate while assigning ratings

New cards
20

What is the effect of consensual observer drift?

increasing consistency but reducing accuracy.

New cards
21

Homogeneity content

tend to have higher reliability coefficients

New cards
22

heterogenous content

lower reliability content

New cards
23

unrestricted range

Reliability coefficients are larger

New cards
24

restricted range causes

reliability coefficients are smaller

New cards
25

The easier it is to guess an answer on a test

the lower the test’s reliability

New cards
26

True or false tests

less reliable regarding guessing and reliability

New cards
27

Multiple-choice tests

more reliable regarding guessing and reliability

New cards
28

Reliability Index

correlation between observed scores and true scores

New cards
29

Calculating the reliability index

taking the square root of the reliability coefficient

New cards
30

Item Analysis

process to determine which items to include in a test by analyzing item difficulty and item discrimination.

New cards
31

Item Difficulty

percentage of examinees who answered an item correctly

New cards
32

Moderately difficult items

(p = .30 to .70)

New cards
33

What is the preferred item difficulty?

moderately difficult items (p = .30 to .70)

New cards
34

Item Discrimination

ability of an item to differentiate between examinees with high and low scores.

New cards
35

Discrimination Index Range

-1.0 to +1.0.

New cards
36

Definition of Standard Error of Measurement (SEM)

how much an obtained score is expected to differ from the true score.

New cards
37

What is Standard Error of Measurement (SEM) used for?

construct confidence intervals

New cards
38

Confidence Intervals

Ranges around a test score that indicate where the true score likely lies

New cards
39

68% CI

±1 SEM

New cards
40

95% CI.

±2 SEM

New cards
41

99% CI

±3 SEM

New cards
42

Item Response Theory (IRT)

focusing on examinee responses to individual items to design tests tailored to specific traits and populations

New cards
43

Item Characteristic Curve (ICC)

graph that shows the probability of answering an item correctly based on the examinee’s trait level

New cards
44

Item Characteristic Curve (ICC) x axis

examinee’s trait level

New cards
45

Item Characteristic Curve (ICC) y axis

probability of answering an item correctly

New cards
46

x axis

horizontal

New cards
47

y axis

vertical

New cards
48

Difficulty Parameter

the level of the trait needed for a 50% probability of answering an item correctly

New cards
49

Probability of Guessing

the point where the curve crosses the Y-axis

New cards
50

Lower values of the Probability of Guessing

harder to guess correctly

New cards
51

Classical Test Theory

Based on true score theory: X (Obtained Score) = T (True Score Variability) + E (Measurement Error).

New cards
52

Test-Retest Reliability definition

Consistency of scores over time, administered twice, correlate scores

New cards
53

What is test-retest reliability useful for?

Useful for stable traits

New cards
54

Internal Consistency Reliability

Consistency across test item

New cards
55

Internal Consistency Reliability is useful for

measuring a single content domain

New cards
56

Factors Affecting Reliability

Content Homogeneity, Range of Scores, Guessing

New cards
57

Item Difficulty (p)

p = correct responses divided by total responses

New cards
robot