Psychological Measurement – Chapter 5: Reliability

0.0(0)
linked notesView linked note
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/57

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key reliability concepts, statistics, error sources, measurement models, and related research issues from Chapter 5 of the lecture notes. These cards aid in mastering terminology essential for understanding and applying reliability theory in psychological testing.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

58 Terms

1

Reliability (Psychometrics)

The consistency of measurement results; the proportion of total score variance that is true variance.

2

Reliability Coefficient

A statistic (0–1) that quantifies reliability; higher values indicate greater consistency.

3

Measurement Error

The inherent uncertainty in any measurement, comprising preventable mistakes and inevitable imprecision.

4

True Score (T)

The long-term average score that would be obtained with infinite error-free measurements on the same instrument.

5

Construct Score

A person’s standing on the theoretical attribute being measured, independent of any specific test.

6

Observed Score (X)

The actual obtained test score, composed of true score plus error (X = T + E).

7

Error Score (E)

The component of an observed score attributable to measurement error.

8

True Variance

Portion of score variance due to real differences among individuals on the measured attribute.

9

Error Variance

Portion of score variance produced by random or systematic measurement error.

10

Systematic Error

Consistent, directional measurement error that inflates or deflates scores (produces bias).

11

Random Error

Unpredictable, nonsystematic fluctuations that raise or lower scores without pattern.

12

Bias (Statistics)

The degree to which systematic error distorts measurement results.

13

Carryover Effects

Influences of prior testing on later scores (e.g., practice and fatigue effects).

14

Practice Effects

Performance gains on a test due to familiarity from earlier administrations.

15

Fatigue Effects

Performance declines on a test due to reduced energy or motivation from repeated testing.

16

Item / Content Sampling

Error source stemming from the specific items chosen for a test or their wording.

17

Sources of Error Variance

Test construction, administration, scoring, interpretation, and examinee-related factors.

18

Test-Retest Reliability

Consistency of scores when the same test is administered to the same group on two occasions.

19

Coefficient of Stability

Test-retest reliability obtained with a retest interval of six months or more.

20

Alternate-Forms Reliability

Correlation of scores between two different, but equivalent, versions of a test.

21

Parallel-Forms Reliability

Reliability of two test forms with equal means and variances; aka coefficient of equivalence.

22

Split-Half Reliability

Internal consistency estimate from correlating scores on two halves of one test administration.

23

Odd-Even Reliability

Split-half reliability obtained by correlating odd- and even-numbered items.

24

Spearman–Brown Formula

Equation used to adjust split-half reliability or predict reliability after changing test length.

25

Internal Consistency

The degree to which test items measure the same construct; assessed via split-half, KR-20, or alpha.

26

Coefficient Alpha (Cronbach’s α)

Average of all possible split-half correlations; common index of internal consistency.

27

Kuder–Richardson Formulas

KR-20/KR-21 coefficients for internal consistency of dichotomous items.

28

McDonald’s Omega

Reliability coefficient that accommodates unequal item loadings, often outperforming alpha.

29

Inter-Scorer (Inter-Rater) Reliability

Agreement among independent scorers, raters, or judges on a measurement.

30

Coefficient of Inter-Scorer Reliability

Correlation (e.g., Pearson r or κ) expressing scorer agreement.

31

Dynamic Characteristic

A trait or state expected to change over time (e.g., mood), complicating test-retest reliability.

32

Static Characteristic

Relatively stable attribute (e.g., intelligence) suitable for test-retest reliability assessment.

33

Homogeneous Test

Instrument whose items all tap a single construct; should show high internal consistency.

34

Heterogeneous Test

Instrument measuring multiple constructs; may show lower internal consistency but still be reliable overall.

35

Restriction of Range

Reduced variability of scores, which lowers observed correlations and reliability estimates.

36

Speed Test

Test with easy items and strict time limit; score differences reflect speed, not item difficulty.

37

Power Test

Test with sufficient time and items varying in difficulty; score differences reflect knowledge/ability.

38

Criterion-Referenced Test

Assessment interpreted against predefined mastery criteria, not relative to norms; traditional reliability indices may mislead.

39

Classical Test Theory (CTT)

Traditional measurement model defining observed score as true score plus error.

40

Domain Sampling Theory

Model viewing a test as a sample from a universe of possible items; precursor to generalizability theory.

41

Generalizability Theory

Extension of domain sampling that estimates multiple error sources (facets) and produces generalizability coefficients.

42

Universe Score

Analog to true score within generalizability theory for a specified set of measurement conditions.

43

Item Response Theory (IRT)

Family of models describing probability of a response as a function of latent trait level and item parameters.

44

Latent-Trait Theory

Another term for IRT emphasizing underlying unobservable traits.

45

Rasch Model

Specific IRT model assuming equal item discrimination and logistic item characteristic curves.

46

Item Difficulty (IRT)

Trait level at which a respondent has 50 % chance of endorsing an item correctly/positively.

47

Item Discrimination (IRT)

Degree to which an item differentiates among individuals at different trait levels.

48

Standard Error of Measurement (SEM)

Estimated standard deviation of an individual’s observed scores around their true score; SEM = σ√(1 − r).

49

Confidence Interval (Score)

Range around an observed score within which the true score is expected to fall with specified probability.

50

Standard Error of the Difference

Statistic used to determine whether two test scores differ significantly, incorporating each score’s SEM.

51

Coefficient of Generalizability

Reliability-like index from generalizability studies indicating dependability across specified facets.

52

Coefficient of Equivalence

Alternate- or parallel-forms reliability coefficient denoting score consistency across forms.

53

Replicability Crisis

Widespread concern that many published psychological findings fail to replicate in independent studies.

54

Questionable Research Practices (QRPs)

Methodological shortcuts (e.g., data peeking, selective reporting) that inflate false-positive results.

55

Preregistration

Publicly recording study hypotheses and methods before data collection to curb QRPs and enhance replicability.

56

Differential Item Functioning (DIF)

Item bias where individuals from different groups with equal ability have different probabilities of endorsing an item.

57

Coefficient of Stability

Alternate term for long-interval test-retest reliability coefficient.

58

Coefficient of Dependability

Generalizability-theory index expressing score consistency for a specific decision purpose.