W1 L1 Data & Evidence – Statistical Thinking (Comprehensive Notes)
Statistical Thinking
Definition: A mindset for interpreting a complex world through simplified summaries that capture essential structure/function while explicitly acknowledging uncertainty.
Core idea: Any estimate or description is accompanied by a degree of doubt—never certainty.
Everyday illustration – Australian dog-ownership estimate:
Council registration records might suggest a total number of dogs, but unregistered pets introduce unknown error.
Emphasises the constant presence of uncertainty.
Example of Statistical Thinking in Practice – ABS Learning Graph
Data source: Australian Bureau of Statistics (ABS) survey on formal & non-formal learning among individuals aged – years.
Key percentages:
Major-city residents in formal study: .
Regional/remote residents in formal study: .
Interpretation:
The bar/column graph converts millions of raw records into two clear comparative statistics.
Demonstrates communication power of statistical summaries for decision-makers and the public.
Three Fundamental Tasks of Statistics
Describe – Convert raw complexity into concise metrics.
Example: "Australia’s population in was ."
Decide – Make evidence-based choices/comparisons under uncertainty.
Example: "South Australia (SA) with people is projected to grow – per year, slower than every state except Tasmania."
• The comparison component (SA vs. others) constitutes a statistical decision.
Predict – Forecast future outcomes from past data.
Example: "National population projected to reach between and by ."
• Interval conveys uncertainty built into prediction models.
Learning From Data – The Empirical Cycle
Previous Research ➔ Hypothesis ➔ Test / Collect New Data ➔ Compare.
Take-away-food anecdote:
Prior reviews ("previous research") rated the restaurant highly.
Formed hypothesis: food would be excellent.
Actual tasting ("data") either supports or contradicts the hypothesis.
Importance: Mirrors formal scientific method used throughout psychological science.
Aggregation
Raw ratings example: are hard to interpret.
Aggregated view (e.g., frequency table or average rating) instantly reveals overall sentiment.
Principle: Summarise observations into meaningful categories/levels while retaining key information about distribution.
Uncertainty & Risk Illustration – Diabetes Risk Calculator
Fictional individual scored on AUSDRISK.
Falls into "low" risk band.
Interpretation: Approximately in develop Type-2 diabetes within years.
Range of plausible true risk depends on sampling error & model uncertainty:
Low-uncertainty scenario: – per .
High-uncertainty scenario: – per .
Conceptual preview: later coursework will formalise confidence intervals & error margins.
Sampling Approaches
Population Illustration
Hypothetical population composition: more red, fewer blue, fewest yellow individuals.
Representative (Probability-Based) Sample
Draw individuals randomly such that sample colour proportions mirror true population.
Supports valid generalisation; sampling error is purely random and quantifiable.
Convenience (Biased) Sample
Participants self-select via ads, social media, campus flyers, etc.
Over-represents certain demographics (e.g., WEIRD — Western, Educated, Industrialised, Rich, Democratic).
Leads to systematic bias; limits external validity of psychological findings.
Causality vs. Correlation
Motto: "Correlation Causation."
Spurious-Correlation website example:
between U.S. per-capita cheese consumption and deaths by bedsheet entanglement.
Absurd explanatory leap ("cheese ➔ nightmares ➔ fatal tangling") highlights danger.
Guidelines for researchers:
Use language of "association" or "relationship" for observational designs.
Inferring causation typically requires experimental manipulation + rigorous controls, yet still warrants caution.
Ethical, Philosophical & Practical Implications (threaded throughout)
Transparent reporting of uncertainty fosters honest science & informed policy.
Awareness of sampling bias encourages equitable, inclusive research that truly represents populations.
Misinterpreting correlation as causation can fuel misinformation, poor interventions, or harmful stereotypes.
Statistical literacy empowers citizens to scrutinise media claims and make better personal decisions (e.g., health risk calculators, education enrolment rates, population projections).