W1 L1 Data & Evidence – Statistical Thinking (Comprehensive Notes)

Statistical Thinking

  • Definition: A mindset for interpreting a complex world through simplified summaries that capture essential structure/function while explicitly acknowledging uncertainty.

  • Core idea: Any estimate or description is accompanied by a degree of doubt—never 100%100\% certainty.

  • Everyday illustration – Australian dog-ownership estimate:

    • Council registration records might suggest a total number of dogs, but unregistered pets introduce unknown error.

    • Emphasises the constant presence of uncertainty.


Example of Statistical Thinking in Practice – ABS Learning Graph

  • Data source: Australian Bureau of Statistics (ABS) survey on formal & non-formal learning among individuals aged 15157474 years.

  • Key percentages:

    • Major-city residents in formal study: 21.8%21.8\%.

    • Regional/remote residents in formal study: 17.3%17.3\%.

  • Interpretation:

    • The bar/column graph converts millions of raw records into two clear comparative statistics.

    • Demonstrates communication power of statistical summaries for decision-makers and the public.


Three Fundamental Tasks of Statistics

  1. Describe – Convert raw complexity into concise metrics.

    • Example: "Australia’s population in 20172017 was 2460000024\,600\,000."

  2. Decide – Make evidence-based choices/comparisons under uncertainty.

    • Example: "South Australia (SA) with 17000001\,700\,000 people is projected to grow 0.10.10.9%0.9\% per year, slower than every state except Tasmania."
      • The comparison component (SA vs. others) constitutes a statistical decision.

  3. Predict – Forecast future outcomes from past data.

    • Example: "National population projected to reach between 37.437.4 and 49.2 million49.2\text{ million} by 20662066."
      • Interval conveys uncertainty built into prediction models.


Learning From Data – The Empirical Cycle

  • Previous ResearchHypothesisTest / Collect New DataCompare.

  • Take-away-food anecdote:

    • Prior reviews ("previous research") rated the restaurant highly.

    • Formed hypothesis: food would be excellent.

    • Actual tasting ("data") either supports or contradicts the hypothesis.

  • Importance: Mirrors formal scientific method used throughout psychological science.


Aggregation

  • Raw ratings example: [4,5,3,5,2,4,5,1,4][4,5,3,5,2,4,5,1,4] are hard to interpret.

  • Aggregated view (e.g., frequency table or average rating) instantly reveals overall sentiment.

  • Principle: Summarise observations into meaningful categories/levels while retaining key information about distribution.


Uncertainty & Risk Illustration – Diabetes Risk Calculator

  • Fictional individual scored 66 on AUSDRISK.

    • Falls into "low" risk band.

    • Interpretation: Approximately 11 in 5050 develop Type-2 diabetes within 55 years.

  • Range of plausible true risk depends on sampling error & model uncertainty:

    • Low-uncertainty scenario: 0022 per 5050.

    • High-uncertainty scenario: 0088 per 5050.

  • Conceptual preview: later coursework will formalise confidence intervals & error margins.


Sampling Approaches

Population Illustration
  • Hypothetical population composition: more red, fewer blue, fewest yellow individuals.

Representative (Probability-Based) Sample
  • Draw individuals randomly such that sample colour proportions mirror true population.

  • Supports valid generalisation; sampling error is purely random and quantifiable.

Convenience (Biased) Sample
  • Participants self-select via ads, social media, campus flyers, etc.

  • Over-represents certain demographics (e.g., WEIRD — Western, Educated, Industrialised, Rich, Democratic).

  • Leads to systematic bias; limits external validity of psychological findings.


Causality vs. Correlation

  • Motto: "Correlation \neq Causation."

  • Spurious-Correlation website example:

    • r0.95r \approx 0.95 between U.S. per-capita cheese consumption and deaths by bedsheet entanglement.

    • Absurd explanatory leap ("cheese ➔ nightmares ➔ fatal tangling") highlights danger.

  • Guidelines for researchers:

    • Use language of "association" or "relationship" for observational designs.

    • Inferring causation typically requires experimental manipulation + rigorous controls, yet still warrants caution.


Ethical, Philosophical & Practical Implications (threaded throughout)

  • Transparent reporting of uncertainty fosters honest science & informed policy.

  • Awareness of sampling bias encourages equitable, inclusive research that truly represents populations.

  • Misinterpreting correlation as causation can fuel misinformation, poor interventions, or harmful stereotypes.

  • Statistical literacy empowers citizens to scrutinise media claims and make better personal decisions (e.g., health risk calculators, education enrolment rates, population projections).