Course Title: PY2501 Research Methods and Data Analysis
Title Slide: How Good is Your Scale and What Does It Mean? by Andrew Schofield
Presentation by: Prof Adrian Burgess
Week 3: Psychometrics 1: Theory of Measurement
Week 8: Psychometrics 2: From Items to Scales
Week 9: Psychometrics 3: How Good is Your Scale and What Does It Mean?
Lecture B: Item Analysis – Refining Scales to Improve Reliability
Reliability
Validity
Biases and Systematic Errors
Interpretation of Scores
Reliability: The reproducibility of a measurement (precision of measurement).
Validity: The accuracy of measurement (measures what it claims to measure).
Accurate: Valid measurement.
Inaccurate: Systematic error (not valid).
Precise: Reliable measurement.
Imprecise: Reproducibility error (unreliable).
Defined as the consistency of scores obtained by the same persons under different conditions.
Reliability Formula:
Reliability = True Variance / (True Variance + Error Variance)
True Variance: Variability due to real individual differences.
Error Variance: Variability due to measurement errors or inconsistent items.
Internal Consistency Reliability: Items should measure the same latent variable.
Test-Retest Reliability: Consistency across measurements over time.
Inter-Rater Reliability: Consistency between different raters' assessments.
Item-Total Correlation: Correlate scores of individual items with total scale score.
Split-Half Reliability: Divide items, score each half, and calculate their correlation.
Cronbach’s Alpha: A measure of internal consistency with acceptable thresholds:
α ≥ 0.9: Excellent
0.9 > α ≥ 0.8: Good
0.8 > α ≥ 0.7: Acceptable
Reliability is sample-dependent; different samples may show different reliability estimates.
Published scales often quote reliability estimates.
Validity Definition: Concerns what the test measures and its effectiveness.
Types of Validity:
Content Validity: Does the scale encompass relevant items?
Criterion Validity: Correlation with other established measures (concurrent & predictive).
Construct Validity: Does it measure the theoretical construct?
A clinical tool designed to detect anxiety and depression.
Reliability: Anxiety r=0.93; Depression r=0.90.
Construct validity established through factor analysis relating to clinical diagnoses.
Random Errors: Assumed to cancel out, lead to variability.
Systematic Errors: Accumulate and introduce bias into measurements.
Random/inattentive responding, Yea saying.
Social desirability biases: Individuals presenting themselves more favorably.
Strategies include using reverse items, lie scales, and normative scoring to reduce biases.
Scores are summative of individual item responses but need careful consideration for reverse-coded items.
Psychological scales are mostly interval-level, hence comparing against normative data is critical.
Comparing scores against established 'Gold Standards' is vital for identifying clinical cases.
Sensitivity: Proportion of true cases correctly identified.
Specificity: Proportion of true non-cases correctly identified.
Psychological scales need diligent design and validation to ensure they effectively measure latent variables while remaining free of systematic biases.