Foundations of Statistical Inference
Course Orientation & Ultimate Goal
- Statistics viewed as both science and art for working with data.
- Fundamental mission:
- Collect, organize, summarize, describe, and draw inferences from data.
- Build a conceptual foundation before diving into computational methods.
- Guiding question: How do we use information from a sample to learn about a population?
Key Definitions
- Statistics (discipline)
- Complete toolkit for handling data life-cycle: collection → organization → summarization → analysis → decision-making.
- Data / Datum
- Plural "data" (singular "datum"); any pieces of information, not necessarily numerical.
- Population
- “Big group” of interest; could be people, objects, events, voters, etc.
- Often very large (size denoted N, typically unknown).
- Sample
- Subset of the population actually observed (size denoted n, always known).
- Goal: sample should represent the population to ensure valid inference.
- Parameter
- Fixed (constant) numerical summary of a population (e.g., true mean, true proportion).
- Unknown and unchanging for the population.
- Statistic
- Numerical summary calculated from a sample (e.g., sample mean xˉ, sample proportion p^).
- Random/variable because it changes from sample to sample.
- Mnemonic: Parameter ↔ Population; Statistic ↔ Sample.
- Variable
- Measured characteristic that can vary across subjects (e.g., height, vote choice).
- Constant
- Quantity that does not vary (e.g., a specific parameter value, a universal physical constant).
Illustrative Study: Toxin & Time-to-Conception
- Research hypothesis: Women exposed to a certain toxin require longer to conceive.
- Data collected: “Time needed to conceive” for women with differing toxin exposures.
- Observed result: Greater exposure → longer conception time.
- Cautionary tale:
- Possible confounder: mothers’ smoking habits during pregnancy may influence daughters’ fertility.
- If unmeasured, we may falsely attribute longer conception time to toxin instead of prenatal smoke exposure.
- Historical parallel: 1980s study linking coffee to heart disease later found the observed association was actually due to smoking prevalence among coffee drinkers.
- Moral: Failure to control external factors leads to wrong decisions even when statistical procedures are applied correctly to the measured variables.
Uncertainty & Probability
- Inference based on a subset introduces uncertainty ⇒ decisions can be right or wrong.
- Need to quantify uncertainty via probability.
- Probability measures the likelihood of outcomes; values fall in [0,1].
- 0 ⇒ impossible; 1 ⇒ certain.
- Probability statements apply only to variables, not constants (parameters).
Visual Framework (Textbook Diagram)
- Population (unknown “reality”) → sampling (size n) → compute sample summaries/histogram → infer about a population model (e.g., Normal, Binomial, Poisson) → attach probability-based certainty measure.
- Assumptions about underlying distribution critical for valid inference.
Two Branches of Statistics
- Descriptive Statistics
- Collection, organization, visualization (charts, tables), and summarization (mean, median, SD) of data.
- Core of high-school level statistics.
- Inferential Statistics
- Uses sample statistics to draw conclusions about population parameters and quantify uncertainty.
- Driving force of modern scientific research.
- Course plan: Master descriptive tools first, then develop inferential methods.
Notation Recap
- Population size: N (often large, unknown).
- Sample size: n (known, chosen by researcher).
Variable Taxonomy
- By measurement scale:
- Quantitative (numeric values make arithmetic sense).
- Qualitative / Categorical (labels, categories, levels).
- By numerical nature (for quantitative variables):
- Discrete Random Variable
- Takes countable, separated values (gaps exist).
- Example: number of people in line; ring sizes (9, 9.5, 10…).
- Continuous Random Variable
- In theory can take any real value within an interval (no gaps).
- Example: precise finger circumference in millimetres; human height.
- Random Variable
- Numerical outcome whose exact value is unknown until observation.
- If qualitative, we often convert to numbers via counts (still yields a discrete random variable).
Examples & Clarifications
- Ring size (standard jeweler increments) ⇒ discrete.
- Measuring pinky circumference with a tape measure ⇒ continuous (any fraction of a millimetre possible in theory).
- Rolling a fair die: outcome before roll is a discrete random variable with values 1,2,3,4,5,6.
Concept Checks
- Parameter: constant (no variability).
- Statistic: variable (changes with each sample).
- Probability only meaningful for variables.
Reading & Next Steps
- Review textbook Section 1 for reinforcement; attempt "Check-Your-Understanding" questions (solutions in appendix).
- Independently read Sections 2 & 3 (no accompanying videos).
- Next lecture/video will begin with Section 4.