Foundations of Statistical Inference

Course Orientation & Ultimate Goal

  • Statistics viewed as both science and art for working with data.
  • Fundamental mission:
    • Collect, organize, summarize, describe, and draw inferences from data.
    • Build a conceptual foundation before diving into computational methods.
  • Guiding question: How do we use information from a sample to learn about a population?

Key Definitions

  • Statistics (discipline)
    • Complete toolkit for handling data life-cycle: collection → organization → summarization → analysis → decision-making.
  • Data / Datum
    • Plural "data" (singular "datum"); any pieces of information, not necessarily numerical.
  • Population
    • “Big group” of interest; could be people, objects, events, voters, etc.
    • Often very large (size denoted NN, typically unknown).
  • Sample
    • Subset of the population actually observed (size denoted nn, always known).
    • Goal: sample should represent the population to ensure valid inference.
  • Parameter
    • Fixed (constant) numerical summary of a population (e.g., true mean, true proportion).
    • Unknown and unchanging for the population.
  • Statistic
    • Numerical summary calculated from a sample (e.g., sample mean xˉ\bar x, sample proportion p^\hat p).
    • Random/variable because it changes from sample to sample.
    • Mnemonic: Parameter ↔ Population; Statistic ↔ Sample.
  • Variable
    • Measured characteristic that can vary across subjects (e.g., height, vote choice).
  • Constant
    • Quantity that does not vary (e.g., a specific parameter value, a universal physical constant).

Illustrative Study: Toxin & Time-to-Conception

  • Research hypothesis: Women exposed to a certain toxin require longer to conceive.
  • Data collected: “Time needed to conceive” for women with differing toxin exposures.
  • Observed result: Greater exposure → longer conception time.
  • Cautionary tale:
    • Possible confounder: mothers’ smoking habits during pregnancy may influence daughters’ fertility.
    • If unmeasured, we may falsely attribute longer conception time to toxin instead of prenatal smoke exposure.
  • Historical parallel: 1980s study linking coffee to heart disease later found the observed association was actually due to smoking prevalence among coffee drinkers.
  • Moral: Failure to control external factors leads to wrong decisions even when statistical procedures are applied correctly to the measured variables.

Uncertainty & Probability

  • Inference based on a subset introduces uncertainty ⇒ decisions can be right or wrong.
  • Need to quantify uncertainty via probability.
    • Probability measures the likelihood of outcomes; values fall in [0,1][0,1].
    • 00 ⇒ impossible; 11 ⇒ certain.
    • Probability statements apply only to variables, not constants (parameters).

Visual Framework (Textbook Diagram)

  • Population (unknown “reality”) → sampling (size nn) → compute sample summaries/histogram → infer about a population model (e.g., Normal, Binomial, Poisson) → attach probability-based certainty measure.
  • Assumptions about underlying distribution critical for valid inference.

Two Branches of Statistics

  • Descriptive Statistics
    • Collection, organization, visualization (charts, tables), and summarization (mean, median, SD) of data.
    • Core of high-school level statistics.
  • Inferential Statistics
    • Uses sample statistics to draw conclusions about population parameters and quantify uncertainty.
    • Driving force of modern scientific research.
  • Course plan: Master descriptive tools first, then develop inferential methods.

Notation Recap

  • Population size: NN (often large, unknown).
  • Sample size: nn (known, chosen by researcher).

Variable Taxonomy

  • By measurement scale:
    • Quantitative (numeric values make arithmetic sense).
    • Qualitative / Categorical (labels, categories, levels).
  • By numerical nature (for quantitative variables):
    1. Discrete Random Variable
    • Takes countable, separated values (gaps exist).
    • Example: number of people in line; ring sizes (9, 9.5, 10…).
    1. Continuous Random Variable
    • In theory can take any real value within an interval (no gaps).
    • Example: precise finger circumference in millimetres; human height.
  • Random Variable
    • Numerical outcome whose exact value is unknown until observation.
    • If qualitative, we often convert to numbers via counts (still yields a discrete random variable).

Examples & Clarifications

  • Ring size (standard jeweler increments) ⇒ discrete.
  • Measuring pinky circumference with a tape measure ⇒ continuous (any fraction of a millimetre possible in theory).
  • Rolling a fair die: outcome before roll is a discrete random variable with values 1,2,3,4,5,6{1,2,3,4,5,6}.

Concept Checks

  • Parameter: constant (no variability).
  • Statistic: variable (changes with each sample).
  • Probability only meaningful for variables.

Reading & Next Steps

  • Review textbook Section 1 for reinforcement; attempt "Check-Your-Understanding" questions (solutions in appendix).
  • Independently read Sections 2 & 3 (no accompanying videos).
  • Next lecture/video will begin with Section 4.