L3 - Validity & Reliability – Comprehensive Lecture Notes

Introduction/Context

  • Associate Professor Bashidjuk discusses the twin concepts of validity and reliability in measurement and research.
  • Builds directly on earlier lectures covering:
    • Measurement error
    • Confounders
    • Bias
  • Central guiding question: “Are my results valid and reliable?”

Measurement Error & Bias (Foundational Review)

  • Measurement error: Any inaccuracy introduced by the measuring instrument or process.
    • Example: Tape measures stretching over time.
  • Measurement bias: Systematic error linked to the instrument itself.
    • Faulty or un-calibrated scales introduce constant over/under estimates.
  • Key reminder: Calibration and maintenance of tools are mandatory to reduce error.

Reliability – "Consistency of the Tool"

  • Definition: The degree to which an instrument yields accurate, stable, repeatable results every time it is used under identical conditions.
  • Practical cues:
    • Think "Does the thermometer give the same reading each time if the temperature hasn’t actually changed?"
    • Re-calibration, battery checks, and routine maintenance sustain reliability.
  • Illustration exercise: Compare measuring your forearm with
    • A standard ruler (expected high consistency)
    • Your hand/fingers (likely low consistency)

Two Primary Sub-types of Reliability

  1. Test–Retest Reliability
    • Procedure: Administer the same test to the same participants on ≥2 occasions separated by time.
    • Statistical check: Correlate score at Time 1 with score at Time 2 → higher r value ⇒ higher stability.
    • Example scenarios:
      • Re-measuring heart rate multiple times.
      • Re-administering a learning assessment one week apart to the same class.
  2. Inter-Rater Reliability
    • Focus: Agreement among different judges/raters on the same observation or submission.
    • Critical whenever scoring is subjective rather than numeric.
    • Methods to bolster it:
      • Multiple assessors/panels (e.g., OSCE stations, interview boards).
      • Standardized marking rubric.
    • Example: Portfolio assessment rated by several faculty members.

Validity – "Accuracy & Legitimacy of the Conclusion"

  • Definition: Extent to which a test or research study measures what it claims to measure and supports correct inferences.
  • “A scale can be perfectly reliable yet invalid” (e.g., +5 kg calibration error).
  • Three key forms addressed:

1 Construct Validity

  • Query: Does the test truly represent the theoretical concept?
  • Illustration:
    • Using a “super-duper hard physics exam” to assess general intelligence is invalid because performance heavily depends on prior physics education, not innate intelligence.
    • Teaching-gap example: Giving Group A a test on material they haven’t been taught while Group B has – results are meaningless comparisons.

2 Internal Validity

  • Query: Are observed effects due solely to the experimental treatment, free of confounders?
  • Example: Testing whether red vs. green font makes reading enjoyable.
    • Confounding risk: Enjoyment may stem from article content rather than font colour.

3 External Validity

  • Query: Can the findings be generalized beyond the study sample/context?
  • Consider: A university-student experiment may not extrapolate to all age groups or real-world settings.

Reliability vs. Validity (Integrative Contrast)

  • Both are essential; neither alone ensures useful data.
    • Faulty scale: Consistently adds 5 kg ⇒ high reliability, low validity.
    • Uncalibrated measurement with random error ⇒ low reliability and validity.
  • Researchers must bake both concepts into:
    • Hypothesis formulation
    • Research design
    • Instrument selection

Case Application – Mobile Phones & Brain Cancer Meta-analysis

  • Public concern: “Do mobile phones cause cancer?”
  • Meta-analysis parameters:
    • PubMed search (specified date range)
    • 19 original studies (case-control + cohort)
    • Exposure = mobile phone use; Outcomes = specific intracranial tumours.
    • Statistical extraction: Quantitative association measures + confidence intervals.
  • Tumour types defined
    • Acoustic Neuroma: Slow-growing on vestibular nerve; requires long observation.
    • Meningioma: Arises from meninges; long latency.
    • Glioma: Originates in glial cells; can affect brain/spinal cord.
  • Summary of findings
    1. Acoustic Neuroma – results vary; low incidence + long latency complicate inference.
    2. Meningioma – no clear risk increase; slow growth hampers definitive conclusions.
    3. Glioma – small positive association for >10 years heavy use, yet evidence still labeled "inconclusive".
  • Meta-analysis critique
    • Poor quality & limited quantity of evidence → threatens internal & external validity.
    • Public health perspective: 30-year, global, uncontrolled "experiment" on billions → essentially no formal informed consent.
  • Personal note from lecturer:
    • Owns two phones.
    • Literature suggests no higher overall risk but tumour location correlates with favoured phone-holding side.

Ethical & Practical Take-aways

  • Validity & reliability are not abstract—they drive trustworthy conclusions that impact public health recommendations.
  • When designing studies of exposure vs. disease:
    • Define exposure clearly (who, what, intensity, duration).
    • Determine measurable, valid outcomes.
    • Plan to mitigate measurement error, confounders, and bias.
  • Homework reflection questions
    1. Identify daily exposures you have (chemicals, behaviours, technologies).
    2. Frame a researchable question linking exposure → potential health effect.
    3. Decide what study design, instruments, and validity/reliability checks you would employ.

Quick Reference – Key Terms & Concepts

  • Reliability: Consistency (test–retest, inter-rater).
  • Validity: Accuracy (construct, internal, external).
  • Measurement error: Random or systematic discrepancies.
  • Confounder: Variable linked to both exposure & outcome that can distort association.
  • Bias: Systematic deviation from the truth.
  • Correlation coefficient r: Numeric estimate of test–retest similarity (range -1 \to 1).
  • Latency period: Time from exposure to detectable disease.

Final Message

  • "We need BOTH valid and reliable results."
  • Robust research demands meticulous attention to tool calibration, study design, and logical inference.
  • Next lecture: Deeper dive into study methods for exposure–outcome research.