Item Analysis and Test Reliability

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Knowt Play

New

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/56

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

57 Terms

New cards

Measurement Error

Variability in scores due to random factors

New cards

Examples of Measurement Error

ambiguous items, fatigue, or distractions that affect performance unpredictably.

New cards

Reliability Coefficient

indicates the proportion of true score variability in a test’s scores, ranging from 0 to 1.

New cards

Minimally Acceptable Reliability Coefficient

0.70

New cards

Minimally Acceptable Reliability Coefficient for high stakes test

0.90 or higher is required

New cards

Alternate Forms Reliability Evaluates

the consistency of scores between two equivalent forms of the test

New cards

Alternate Forms Reliability is useful for

when tests have multiple versions

New cards

What does Internal Consistency Reliability measure

the consistency of scores across different test items

New cards

When is internal consistency reliability useful?

tests measuring a single content domain

New cards

Coefficient Alpha

A measure of internal consistency reliability that calculates the average inter-item correlation

New cards

What kind of data is used for coefficient alpha?

continuous test items

New cards

Kuder-Richardson 20 (KR-20)

used for tests with dichotomous items (e.g., correct/incorrect).

New cards

Split-Half Reliability

Splits a test into two halves (e.g., even and odd items) and correlates scores on both halves

New cards

What corrects the split-half reliability?

Spearman-Brown formula

New cards

What does Inter-Rater Reliability assess?

the consistency of scores assigned by different raters.

New cards

What is inter-rater reliability used for?

Important for subjectively scored measures like essays or interviews

New cards

What does Cohen’s Kappa Coefficient correct for?

the chance agreement between raters

New cards

What is cohen’s kappa coefficient used for?

when ratings represent unranked categories (nominal scale)

New cards

Consensual Observer Drift

when raters communicate while assigning ratings

New cards

What is the effect of consensual observer drift?

increasing consistency but reducing accuracy.

New cards

Homogeneity content

tend to have higher reliability coefficients

New cards

heterogenous content

lower reliability content

New cards

unrestricted range

Reliability coefficients are larger

New cards

restricted range causes

reliability coefficients are smaller

New cards

The easier it is to guess an answer on a test

the lower the test’s reliability

New cards

True or false tests

less reliable regarding guessing and reliability

New cards

Multiple-choice tests

more reliable regarding guessing and reliability

New cards

Reliability Index

correlation between observed scores and true scores

New cards

Calculating the reliability index

taking the square root of the reliability coefficient

New cards

Item Analysis

process to determine which items to include in a test by analyzing item difficulty and item discrimination.

New cards

Item Difficulty

percentage of examinees who answered an item correctly

New cards

Moderately difficult items

(p = .30 to .70)

New cards

What is the preferred item difficulty?

moderately difficult items (p = .30 to .70)

New cards

Item Discrimination

ability of an item to differentiate between examinees with high and low scores.

New cards

Discrimination Index Range

-1.0 to +1.0.

New cards

Definition of Standard Error of Measurement (SEM)

how much an obtained score is expected to differ from the true score.

New cards

What is Standard Error of Measurement (SEM) used for?

construct confidence intervals

New cards

Confidence Intervals

Ranges around a test score that indicate where the true score likely lies

New cards

68% CI

±1 SEM

New cards

95% CI.

±2 SEM

New cards

99% CI

±3 SEM

New cards

Item Response Theory (IRT)

focusing on examinee responses to individual items to design tests tailored to specific traits and populations

New cards

Item Characteristic Curve (ICC)

graph that shows the probability of answering an item correctly based on the examinee’s trait level

New cards

Item Characteristic Curve (ICC) x axis

examinee’s trait level

New cards

Item Characteristic Curve (ICC) y axis

probability of answering an item correctly

New cards

x axis

horizontal

New cards

y axis

vertical

New cards

Difficulty Parameter

the level of the trait needed for a 50% probability of answering an item correctly

New cards

Probability of Guessing

the point where the curve crosses the Y-axis

New cards

Lower values of the Probability of Guessing

harder to guess correctly

New cards

Classical Test Theory

Based on true score theory: X (Obtained Score) = T (True Score Variability) + E (Measurement Error).

New cards

Test-Retest Reliability definition

Consistency of scores over time, administered twice, correlate scores

New cards

What is test-retest reliability useful for?

Useful for stable traits

New cards

Internal Consistency Reliability

Consistency across test item

New cards

Internal Consistency Reliability is useful for

measuring a single content domain

New cards

Factors Affecting Reliability

Content Homogeneity, Range of Scores, Guessing

New cards

Item Difficulty (p)

p = correct responses divided by total responses