L3 - Validity & Reliability – Comprehensive Lecture Notes
Introduction/Context
- Associate Professor Bashidjuk discusses the twin concepts of validity and reliability in measurement and research.
- Builds directly on earlier lectures covering:
- Measurement error
- Confounders
- Bias
- Central guiding question: “Are my results valid and reliable?”
Measurement Error & Bias (Foundational Review)
- Measurement error: Any inaccuracy introduced by the measuring instrument or process.
- Example: Tape measures stretching over time.
- Measurement bias: Systematic error linked to the instrument itself.
- Faulty or un-calibrated scales introduce constant over/under estimates.
- Key reminder: Calibration and maintenance of tools are mandatory to reduce error.
- Definition: The degree to which an instrument yields accurate, stable, repeatable results every time it is used under identical conditions.
- Practical cues:
- Think "Does the thermometer give the same reading each time if the temperature hasn’t actually changed?"
- Re-calibration, battery checks, and routine maintenance sustain reliability.
- Illustration exercise: Compare measuring your forearm with
- A standard ruler (expected high consistency)
- Your hand/fingers (likely low consistency)
Two Primary Sub-types of Reliability
- Test–Retest Reliability
- Procedure: Administer the same test to the same participants on ≥2 occasions separated by time.
- Statistical check: Correlate score at Time 1 with score at Time 2 → higher r value ⇒ higher stability.
- Example scenarios:
- Re-measuring heart rate multiple times.
- Re-administering a learning assessment one week apart to the same class.
- Inter-Rater Reliability
- Focus: Agreement among different judges/raters on the same observation or submission.
- Critical whenever scoring is subjective rather than numeric.
- Methods to bolster it:
- Multiple assessors/panels (e.g., OSCE stations, interview boards).
- Standardized marking rubric.
- Example: Portfolio assessment rated by several faculty members.
Validity – "Accuracy & Legitimacy of the Conclusion"
- Definition: Extent to which a test or research study measures what it claims to measure and supports correct inferences.
- “A scale can be perfectly reliable yet invalid” (e.g., +5 kg calibration error).
- Three key forms addressed:
1 Construct Validity
- Query: Does the test truly represent the theoretical concept?
- Illustration:
- Using a “super-duper hard physics exam” to assess general intelligence is invalid because performance heavily depends on prior physics education, not innate intelligence.
- Teaching-gap example: Giving Group A a test on material they haven’t been taught while Group B has – results are meaningless comparisons.
2 Internal Validity
- Query: Are observed effects due solely to the experimental treatment, free of confounders?
- Example: Testing whether red vs. green font makes reading enjoyable.
- Confounding risk: Enjoyment may stem from article content rather than font colour.
3 External Validity
- Query: Can the findings be generalized beyond the study sample/context?
- Consider: A university-student experiment may not extrapolate to all age groups or real-world settings.
Reliability vs. Validity (Integrative Contrast)
- Both are essential; neither alone ensures useful data.
- Faulty scale: Consistently adds 5 kg ⇒ high reliability, low validity.
- Uncalibrated measurement with random error ⇒ low reliability and validity.
- Researchers must bake both concepts into:
- Hypothesis formulation
- Research design
- Instrument selection
- Public concern: “Do mobile phones cause cancer?”
- Meta-analysis parameters:
- PubMed search (specified date range)
- 19 original studies (case-control + cohort)
- Exposure = mobile phone use; Outcomes = specific intracranial tumours.
- Statistical extraction: Quantitative association measures + confidence intervals.
- Tumour types defined
- Acoustic Neuroma: Slow-growing on vestibular nerve; requires long observation.
- Meningioma: Arises from meninges; long latency.
- Glioma: Originates in glial cells; can affect brain/spinal cord.
- Summary of findings
- Acoustic Neuroma – results vary; low incidence + long latency complicate inference.
- Meningioma – no clear risk increase; slow growth hampers definitive conclusions.
- Glioma – small positive association for >10 years heavy use, yet evidence still labeled "inconclusive".
- Meta-analysis critique
- Poor quality & limited quantity of evidence → threatens internal & external validity.
- Public health perspective: 30-year, global, uncontrolled "experiment" on billions → essentially no formal informed consent.
- Personal note from lecturer:
- Owns two phones.
- Literature suggests no higher overall risk but tumour location correlates with favoured phone-holding side.
Ethical & Practical Take-aways
- Validity & reliability are not abstract—they drive trustworthy conclusions that impact public health recommendations.
- When designing studies of exposure vs. disease:
- Define exposure clearly (who, what, intensity, duration).
- Determine measurable, valid outcomes.
- Plan to mitigate measurement error, confounders, and bias.
- Homework reflection questions
- Identify daily exposures you have (chemicals, behaviours, technologies).
- Frame a researchable question linking exposure → potential health effect.
- Decide what study design, instruments, and validity/reliability checks you would employ.
Quick Reference – Key Terms & Concepts
- Reliability: Consistency (test–retest, inter-rater).
- Validity: Accuracy (construct, internal, external).
- Measurement error: Random or systematic discrepancies.
- Confounder: Variable linked to both exposure & outcome that can distort association.
- Bias: Systematic deviation from the truth.
- Correlation coefficient r: Numeric estimate of test–retest similarity (range -1 \to 1).
- Latency period: Time from exposure to detectable disease.
Final Message
- "We need BOTH valid and reliable results."
- Robust research demands meticulous attention to tool calibration, study design, and logical inference.
- Next lecture: Deeper dive into study methods for exposure–outcome research.