W1 L1 Data & Evidence – Statistical Thinking (Comprehensive Notes)

Definition: A mindset for interpreting a complex world through simplified summaries that capture essential structure/function while explicitly acknowledging uncertainty.
Core idea: Any estimate or description is accompanied by a degree of doubt—never $100\%$ certainty.
Everyday illustration – Australian dog-ownership estimate:
- Council registration records might suggest a total number of dogs, but unregistered pets introduce unknown error.
- Emphasises the constant presence of uncertainty.

Data source: Australian Bureau of Statistics (ABS) survey on formal & non-formal learning among individuals aged $15$ – $74$ years.
Key percentages:
- Major-city residents in formal study: $21.8\%$ .
- Regional/remote residents in formal study: $17.3\%$ .
Interpretation:
- The bar/column graph converts millions of raw records into two clear comparative statistics.
- Demonstrates communication power of statistical summaries for decision-makers and the public.

Describe – Convert raw complexity into concise metrics.
- Example: "Australia’s population in $2017$ was $24\,600\,000$ ."
Decide – Make evidence-based choices/comparisons under uncertainty.
- Example: "South Australia (SA) with $1\,700\,000$ people is projected to grow $0.1$ – $0.9\%$ per year, slower than every state except Tasmania."
  • The comparison component (SA vs. others) constitutes a statistical decision.
Predict – Forecast future outcomes from past data.
- Example: "National population projected to reach between $37.4$ and $49.2\text{ million}$ by $2066$ ."
  • Interval conveys uncertainty built into prediction models.

Previous Research ➔ Hypothesis ➔ Test / Collect New Data ➔ Compare.
Take-away-food anecdote:
- Prior reviews ("previous research") rated the restaurant highly.
- Formed hypothesis: food would be excellent.
- Actual tasting ("data") either supports or contradicts the hypothesis.
Importance: Mirrors formal scientific method used throughout psychological science.

Raw ratings example: $[4,5,3,5,2,4,5,1,4]$ are hard to interpret.
Aggregated view (e.g., frequency table or average rating) instantly reveals overall sentiment.
Principle: Summarise observations into meaningful categories/levels while retaining key information about distribution.

Fictional individual scored $6$ on AUSDRISK.
- Falls into "low" risk band.
- Interpretation: Approximately $1$ in $50$ develop Type-2 diabetes within $5$ years.
Range of plausible true risk depends on sampling error & model uncertainty:
- Low-uncertainty scenario: $0$ – $2$ per $50$ .
- High-uncertainty scenario: $0$ – $8$ per $50$ .
Conceptual preview: later coursework will formalise confidence intervals & error margins.

Hypothetical population composition: more red, fewer blue, fewest yellow individuals.

Draw individuals randomly such that sample colour proportions mirror true population.
Supports valid generalisation; sampling error is purely random and quantifiable.

Participants self-select via ads, social media, campus flyers, etc.
Over-represents certain demographics (e.g., WEIRD — Western, Educated, Industrialised, Rich, Democratic).
Leads to systematic bias; limits external validity of psychological findings.

Motto: "Correlation $\neq$ Causation."
Spurious-Correlation website example:
- $r \approx 0.95$ between U.S. per-capita cheese consumption and deaths by bedsheet entanglement.
- Absurd explanatory leap ("cheese ➔ nightmares ➔ fatal tangling") highlights danger.
Guidelines for researchers:
- Use language of "association" or "relationship" for observational designs.
- Inferring causation typically requires experimental manipulation + rigorous controls, yet still warrants caution.

Transparent reporting of uncertainty fosters honest science & informed policy.
Awareness of sampling bias encourages equitable, inclusive research that truly represents populations.
Misinterpreting correlation as causation can fuel misinformation, poor interventions, or harmful stereotypes.
Statistical literacy empowers citizens to scrutinise media claims and make better personal decisions (e.g., health risk calculators, education enrolment rates, population projections).