Introduction to Statistics and Statistical Thinking
What is statistics?
Statistics is the science of collecting, organizing, summarizing, and analyzing information (data) to draw conclusions, answer questions, and provide a measure of confidence.
Four-part breakdown:
Collect (data).
Organize and summarize.
Analyze to draw conclusions.
Report results with confidence.
Data describe characteristics of individuals or things (e.g., age, weight).
Data can be numeric (quantitative) or non-numeric (categorical).
Data can offset anecdotal claims when properly collected and analyzed (e.g., cell phone A usage and brain cancer: no link).
Data can be dangerous if misused (e.g., biased talk-show polls – non-representative samples lead to meaningless results).
Caution: Results from one group (e.g., high school students) may not generalize to others (e.g., college students) without careful consideration.
Data interpretation pitfalls and causal inference:
Observing an association (e.g., cola consumption and lower bone mineral density) does not imply causation.
Lurking variables can explain associations: a factor not accounted for that influences both supposed cause and effect (e.g., exercise level or calcium intake in the cola example).
Variation is inherent in data:
Individuals within groups differ.
Even repeated measurements on the same individual show variation.
Different studies on the same question often yield different results due to this variation.
Goals of describing variation:
Understand sources of variation graphically and numerically.
Explain why different polls report different results.
Key contrast between mathematics and statistics:
Mathematics: Results are 100% certain (e.g., yields ).
Statistics: Results contain uncertainty (e.g., "We are 0.95\text{ (i.e., 95%)} confident that the average commute time in Dallas is between and minutes," meaning the 95% confidence interval for μ is minutes).
Why statistics is useful:
Provides understanding of the world by explaining and controlling variability.
Helps analyze media reports, make informed decisions, and become a critical thinker.
Real-world implications and ethical considerations:
Data are powerful but dangerous if misused.
Always consider sampling methods, representativeness, and potential lurking variables.
Be cautious when generalizing results beyond the studied group.
Quick recap of key ideas:
Data are information for answering questions.
Data vary; understanding variation is central.
Associations do not imply causation; lurking variables exist.
Statistics emphasizes uncertainty and confidence in estimates.
Final takeaway:
Statistics provides tools to collect, organize, analyze, and report data with quantified confidence, accounting for variation, lurking variables, and limits of generalization.