Reliability

Measurement error = all factors associated with process of measuring some variable, other than the variable being measured
Random error = a source of error in measuring caused by fluctuations and inconsistency of other variables in the measurement process
Systematic error = a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be a true value of the variable measured
Reliability

Psychological measures = assess psychological differences, hinge on ability to do so accurately
Classical test theory

Measurement theory that defines the conceptual basis for reliability
Holds that differences in test scores reflect actual differences in true levels of attitudes instead of measurement error
Came from physical sciences as two results, true score and error)
Assumption = true score and error values are uncorrelated, observed score = true scores and measurement error, error occurs as if it was random , error s will cancel out
With measurement comes unreliability
Reliability is a test property that derives from observed scores(values obtained from the measurement of some characteristic in a person), true scores (real amounts of that characteristic) and measurement error
Error = true score impossible to detect (mean score from an infinite number of administrations of test, hypothetical construct), true scores obscured by errors, error occurs as if random
Observed score = true score + error
Error occurs as if random, so effects are independent of true values, no correlation between true and error scores

Observed score can be higher or lower than true score on multiple tests so expected to cancel out in long run, mean average of 0
Sources of RE = candidate related, procedural, environmental, test-related

Reliability = testing consistency

Multiple administration = test retest reliability, alternate forms of reliability
Single administration = split half reliability, Cronbach's alpha
Multiple judges = interrater reliability

Same group tested twice and results are correlated using Pearson's r (measure of correlation)
Each individual will get different results from another
If scale is reliable they and the group should get similar results on time one and time 2
Assesses consistency over time
Problems = practice effects, participant attrition, maturation effects and participant memory

Two independent versions of measure used
People do both consecutively given to the same group and scores compared with Pearson's r
Alternate forms reliability gives a measure of consistency
Alternate forms not vulnerable to practice effects
But can we ever know if two alternate tests are truly parallel

Degree of agreement between two or more scorers regarding a measure
Used in observational studies and measures the degree of consensus between observers
Affected by degree of objectivity in the measurement system (shouldn't matter who gives the measure )
Low inter rater reliability can be due to a defective scale or insufficient training

Involves correlating every possible half of items with other half and finding average
Value of 0.7 or higher acceptable

Research measures don’t have to be as reliable as those used in clinical practices( large sample size evens out individually unreliable scores, individuals lives can be very affected by results of clinical measures)
Sometimes better for a scale to have low internal reliability (example Cleckleys's key characteristics of antisocial personality disorder psychopath has unrelated features, removing items to make it reliable would make it less effective at identifying people with high psychoapathy scores, removing consistency versus unidimensionality )

Test items homogenous or heterogenous in nature
Characteristic, ability, or trait being measured is presumed to be dynamic or static
Range of scores is/is not restricted
Test is a speed or power test
Test is/is not criterion referenced

Reliability is a matter of degree
More reliance on reliabilities based on large samples
Sample size for calculating reliability should not be below 100
Expect to see test-retest reliability for ability and reasoning measures
Ability aptitude IQ should have coefficients above 0.8
Personality and other usually a=have coefficients between 0.6 and 0.8 but 0.7 recommended as minimum
Always look for reliability coefficient and size of sample to calculate it

Measure of precision of an observed test score, estimate of amount of error inherent in an observed score or measurement
High reliability = low standard error can be used to estimate the extent to which an observed score deviates from a true score
Confidence interval = range of scores that likely contain a true score