Reliability

  • Precision =  

  1.  Estimated reliability  

  2. Consistency of test scores across a sample  

  3. Consistency of test scores across time  

  • Trueness  

  1. Estimated validity ' 

  2. Identify relations with other constructs(theoretical or empirical) 

  3. Model fit (theoretical models and measurement models  

  • Measurement error = all factors associated with process of measuring some variable, other than the variable being measured  

  • Random error = a source of error in measuring caused by fluctuations and inconsistency of other variables in the measurement process 

  • Systematic error = a source of error in measuring a variable that is typically constant or proportionate to what is presumed to be a true value of the variable measured  

  • Reliability  

  1. Measure of consistency  

  2. With the same measures and circumstances the results should be similar  

  3. Result in reliable and replicable studies  

  • Psychological measures = assess psychological differences, hinge on ability to do so accurately  

  • Classical test theory  

  1. Measurement theory that defines the conceptual basis for reliability 

  2. Holds that differences in test scores reflect actual differences in true levels of attitudes instead of measurement error  

  3. Came from physical sciences as two results, true score and error) 

  4. Assumption = true score and error values are uncorrelated, observed score = true scores and measurement error, error occurs as if it was random , error s will cancel out   

  5. With measurement comes unreliability  

  6. Reliability is a test property that derives from observed scores(values obtained from the measurement of some characteristic in a person), true scores (real amounts of that characteristic) and measurement error 

  7. Error = true score impossible to detect (mean score from an infinite number of administrations of test, hypothetical construct), true scores obscured by errors, error occurs as if random 

  8. Observed score = true score + error 

  9. Error occurs as if random, so effects are independent of true values, no correlation between true and error scores  

  • Random error  

  1. Observed score can be higher or lower than true score on multiple tests so expected to cancel out in long run, mean average of 0 

  2. Sources of RE = candidate related, procedural, environmental, test-related  

 Reliability = testing consistency  

  • Methods for assessing reliability =  

  1. Multiple administration =  test retest reliability, alternate forms of reliability  

  2. Single administration = split half reliability, Cronbach's alpha  

  3. Multiple judges = interrater reliability  

  • Test retest reliability =  

  1. Same group tested twice and results are correlated using Pearson's r (measure of correlation) 

  2. Each individual will get different results from another  

  3. If scale is reliable they and the group should get similar results on time one and time 2 

  4. Assesses consistency over time  

  5. Problems = practice effects, participant attrition, maturation effects and participant memory  

  • Alternate forms reliability  

  1. Two independent versions of measure used  

  2. People do both consecutively given to the same group and scores compared with Pearson's r 

  3. Alternate forms reliability gives a measure of consistency  

  4. Alternate forms not vulnerable to practice effects  

  5. But can we ever know if two alternate tests are truly parallel 

  • Inter scorer reliability  

  1. Degree of agreement between two or more scorers regarding a measure 

  2. Used in observational studies and measures the degree of consensus between observers  

  3. Affected by degree of objectivity in the measurement system (shouldn't matter who gives the measure ) 

  4. Low inter rater reliability can be due to a defective scale or insufficient training  

  • Internal consistency  

  1. Measures how related items on test are to another  

  2. Related to the fact they measure a similar attribute, so internally consistent  

  • Split half reliability  

  1. Scores on half of survey correlate with other half  

  2. Can compare first with second half or odd with even ones  

  3. Measures internal consistency  

  4. Addresses fatigue and memory effects and time factor 

  5. Reliability depends on number of items 

  • Cronbach's a  

  1. Involves correlating every possible half of items with other half and finding average  

  2. Value of 0.7 or higher acceptable  

  • Reliability issues  

  1. Research measures don’t have to be as reliable as those used in clinical practices( large sample size evens out individually unreliable scores, individuals lives can be very affected by results of clinical measures) 

  2. Sometimes better for a scale to have low internal reliability (example Cleckleys's key characteristics of antisocial personality disorder psychopath has unrelated features, removing items to make it reliable would make it less effective at identifying people with high psychoapathy scores, removing consistency versus unidimensionality ) 

  • Reliability estimate = nature of test determines reliability metric like 

  1. Test items homogenous or heterogenous in nature  

  2. Characteristic, ability, or trait being measured is presumed to be dynamic or static  

  3. Range of scores is/is not restricted  

  4. Test is a speed or power test  

  5. Test is/is not criterion referenced  

  • Reliability issues  

  1. Reliability is a matter of degree 

  2. More reliance on reliabilities based on large samples  

  3. Sample size for calculating reliability should not be below 100 

  4. Expect to see test-retest reliability for ability and reasoning measures  

  5. Ability aptitude IQ should have coefficients above 0.8 

  6. Personality and other usually a=have coefficients between 0.6 and 0.8 but 0.7 recommended as minimum 

  7. Always look for reliability coefficient and size of sample to calculate it  

  • Improve reliability 

  1. Clear conceptualization  

  2. Standardization  

  3. Increase number of items  

  4. Use more precise measurement  

  5. Use multiple indicators  

  6. Pilot testing and replication  

  • Standard error of measurement (SEM) 

  1. Measure of precision of an observed test score, estimate of amount of error inherent in an observed score or measurement  

  2. High reliability = low standard error can be used to estimate the extent to which an observed score deviates from a true score  

  3. Confidence interval = range of scores that likely contain a true score