Week 3 (Reliability(

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/81

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

82 Terms

New cards

Reliability

Refers to precision in measurement

Is determined by consistency of scored obtained by same persons on equivalent/ parallel tests

New cards

Error

Is inevitable; it is the difference between the observed and true scores due to test limitations

New cards

Psychological Traits

Are abstract; measured using imperfect tools that may over or under estimate

New cards

Psychologists must

Evaluate how much error exists in their measurement tools

New cards

Using unreliable tools

Risks flawed understandings of behaviour

New cards

Spearman (1904)

Combined sampling error & correlation to develop reliability theory

New cards

Classical Test Theory

true score (T); observed score (X) = true score (T) + random error (E)

New cards

Random Error

Causes variability in repeated test scores, producing a normal distribution around the true score

New cards

Greater dispersion

Less reliability

New cards

Narrow dispersion

more accurate representation of ‘true’ ability

New cards

Domain Sampling Method

A technique used in test construction where multiple items are drawn from a larger domain to better estimate a person's true ability.

New cards

As test length (number of items) increases

sampling error decreases and reliability increases

New cards

Repeated random sampling

yields normally distributed estimates of the true score

New cards

Item Response Theory

Modern alternative to Classical Test Theory which improves measurement precision. Adapts item difficulty based on individual response (adaptive testing), focusing testing around the individuals actual ability levels for greater accuracy. Leads to shorter tests with higher reliability than classical methods.

New cards

Reliability Coefficient

Ratio of variance of true scores to variance of observed scores. Tells us what proportion of the test variance is non-error.

New cards

Reliability coefficient of .75

75% of variance in test scores is due to true differences in ability & 25% of the variance in test scores is due to error.

New cards

Time Sampling Error

The error that occurs when test scores are influenced by the particular moment in time when the testing is conducted, potentially affecting the measurement of a person’s true ability.

New cards

Test-Retest Reliability

Extent to which scores can be generalised and remain unchanged over time when measuring stable constructs (e.g., personality).

New cards

Test-Retest Reliability Coefficient

Correlation between scores obtained on identical tests administered on seperate occasions.

New cards

Test-retest correlations _______________ as inter-test interval lengthens.

tend to decrease

New cards

Inter-test Interval for Test-Retest Reliability

Should not be too long, or trait being measured is likely to undergo real change

New cards

Longer Inter-test Intervals

Introduce other influencing factors

New cards

Test-Retest Reliability is only applicable to ____________

stable traits (e.g., intelligence) not changing states (e.g., mood)

New cards

Test-Retest Reliability primarily addresses error due to__________

temporary changes in test taker, for example, illness, tiredeness, emotional problems, effects of medication, etc.

New cards

Test-Retest Reliability can also be influenced by

Error due to test administration, scoring/ interpretation

New cards

Test-retest Reliability does not

account for error due to variation in tent content; since the same test is used

New cards

A test-retest reliability limitation

a nuisance to obtain test-retest data

New cards

In test-retest reliability performance on the first test may ____________

influence performance on the second test, for example practice may produce different degrees of improvement in retest score.

New cards

Alternate Forms Reliability

Two equivalent forms with different items but same selection rules are used to calculate reliability.

New cards

Alternate Forms Reliability ensures

test scores aren’t dependent on a specific set of items from the domain. (item sampling error)

New cards

Alternate Forms Reliability Coefficient

Correlations between scores obtained on two equivalent test forms

New cards

Immediate Succession Alternate Forms Reliability

Primarily addresses unreliability due to content sampling

New cards

Inter-test Interval Alternate Forms Reliability

Addresses unreliability due to content sampling & variations due to temporary changes in test-taker.

New cards

Alternate forms are __________

Not used frequently because most tests don’t have alternate forms

New cards

Inter-Scorer Reliability

Degree of agreement or consistency between two or more scorers or raters

New cards

Inter-Scorer Reliability provides information about ___________

unsystematic error arising from variation in scoring & interpretation BUT not any other source of error

New cards

Inter-Scorer Reliability is important when _________

judgement enters the scoring process for example in projective personality tests

New cards

Internal Consistency

Extent to which items measure the same underlying construct.

New cards

Internal Consistency determined by

examining relationship among items on 1 test at a single point in time. If they measure the same construct they should be correlated with each other.

New cards

Split-Half Method

Measure on internal consistency involving correlating one half of a test with the other half (random split or odd-even).

New cards

Split-Half Method correlation

Under estimates reliability as each half is shorter and thus less reliable

New cards

Spearman-Brown Formula

Takes the split half correlation as input and converts it to an estimate of the equivalent level of reliability for the full-length test, for a better reliability estimate.

New cards

Kinder Richardson Formula

Method used to assess the internal consistency of a measure based on dichotomous data (right or wrong).

New cards

Kinder Richardson (K-R 20) Formula

gives a coefficient for any test which is equal to the average of all possible split-half coefficients

high item covariance increases reliability.
Equivalent to the avg of all split-half reliabilities - robust.

New cards

Cronbach’s Alpha

Method used to assess internal consistency of a measure — generalises KR20 to apply to non-dichotomous items (e.g., Likert scales)

New cards

For alpha to be meaningful

Tests should be built to assess a single domain/trait

New cards

Various measure of internal consistency assess unreliability due to ______

content sampling

New cards

Tests can be developed to have high internal consistency by

having items with highly similar content → sampling may be so constructed as to be trivial

New cards

.90s

high reliability (any higher may indicate items are too similar!)

New cards

.80s

moderate to high reliability

New cards

.70s

low to moderate reliability (must be at least this value!)

New cards

.60s

unacceptably low reliability

New cards

___________ have higher reliabilities (.90s)

Cognitive ability tests

New cards

______________ have second highest reliabilities (.80s)

Self-report tests of personality

New cards

Research requires ___________ alpha level

.70 - .80 are acceptable

New cards

Clinical decision-making settings require _______ alpha level

equal to or greater than .90 (must)

New cards

Reliability can be improved by _________ number of items

increasing

New cards

Reliability can be improved by _______ items reducing reliability

discarding

New cards

Reliability can be improved by providing estimate correlation without __________

measurement error

New cards

A test may yield scores than can be reliably used in some situations ______________

but not in others

New cards

Homogenous items

Measure 1 factor and is appropriate for internal consistency

New cards

Heterogenous items

Measure a range of factors and are therefore not appropriate for internal consistency

New cards

Dynamic traits

Including mood fluctuate and thus test-retest reliability is not appropriate

New cards

Static traits

Including intelligence remain stable over time and are thus appropriate for test-retest reliability

New cards

Range Restriction

Reliability decreases when variance of true scores decreases

New cards

Range Inflation

Reliability increases when variance of true scores increases

New cards

Criterion-referenced test

Evaluates whether a specific criterion (e.g., pass/fail) has been achieved.

Reduces true score variability, thereby lowering reliability—even if individual performance is stable.

Reliability becomes less critical when the test is used for prediction.

New cards

Speed test

time limited; focus is on speed rather than difficulty

Items are interdependent → Internal consistency inappropriate.
Test-retest reliability is appropriate.

New cards

Power test

Focuses on difficulty (untimed).

Treated like regular tests for reliability (internal consistency, test-retest, etc.).

New cards

Standard Error of Measurement (SEM)

Estimates how repeated measures of a person on the same instrument tend to be distributed around their “true” score

New cards

Purpose of Standard Error of Measurement (SEM)

Evaluates the precision of an individual’s observed score as an estimate of their true score (vs. reliability coefficients, which assess overall test quality).

how much an observed score might deviate from the true score due to unsystematic error.

New cards

Reliability

Quality of a ruler

New cards

SEM

How precise one measurement is with that ruler

New cards

Small SEM

High precision → Observed scores are close to the true score (high confidence).

New cards

Large SEM

Low precision → More error, less confidence in observed scores.

New cards

±1 SEM

~68% of scores

New cards

±1.96 SEM

~95% of scores

New cards

±2.58 SEM

~99% of scores

New cards

True Score (T)

The theoretical, unchanging value of a trait (e.g., IQ, extroversion) for an individual. Never directly known but inferred through repeated testing (mean of observed scores).

If someone takes a test infinitely, their average score = true score.

New cards

Error (E)

Unsystematic factors (e.g., mood, guessing, distractions). Causes observed scores to vary around the true score in repeated testing. It is random (mean = 0) and uncorrelated with true scores.

New cards

Observed Score Variability

Each test is a "sample" of possible items/trials; scores naturally fluctuate. Repeated testing produces a range of observed scores around T (like a sampling distribution).

New cards

Practical Implications of CTT

Goal to minimize error (E) to make observed scores (X) closer to true scores (T).

Use reliability coefficients (e.g., Cronbach’s α) and SEM to quantify error

Confidence intervals (e.g., "IQ = 110 ± 5") reflect CTT’s error model.