UNIT 1 STATS PLUS
Statistics Basics
Statistics: Science of data → turns data into knowledge.
Population: Entire group being studied.
Sample: Small group from the population.
Variable: The characteristic being measured.
Types of Data
Quantitative (Numerical): Numbers (height, weight, test scores).
Qualitative (Categorical): Groups/categories (gender, colors, brands).
Parameters vs. Statistics
Parameter: Data from a population.
Statistic: Data from a sample.
Sampling Methods
Simple Random Sample (SRS) – Everyone has an equal chance.
Stratified Sample – Divide into groups, sample some from each.
Cluster Sample – Divide into groups, sample entire groups.
Systematic Sample – Pick every nth person from a list.
Types of Studies
Anecdote – Personal story (unreliable).
Observational Study – No interference, only observes associations.
Controlled Experiment – Researcher assigns treatments (determines causality).
Good Experiment Features
Large sample size (at least 30).
Random assignment (reduces bias).
Blinding (hides treatment to prevent bias).
Placebo (fake treatment for comparison).
Two-Way Tables
Compare two variables (e.g., commuters vs. breakfast habits).
Percentage calculations: part/total × 100%.
Graphs for Numerical Data
Dotplot – Each data point is a dot.
Histogram – Data grouped into intervals (bars).
Stemplot – Breaks data into stems & leaves.
Describe Distributions
Shape: Symmetric or skewed?
Center: Middle value (mean or median).
Spread: How spread out the data is.
Graphs for Categorical Data
Bar Graph – Separate bars for each category.
Pie Chart – Shows percentage of each category.
Measures of Center (Averages)
Mean (x̄) = sum of data ÷ number of values.
Median = Middle value (after sorting).
Mode = Most frequent value.
Measures of Dispersion (Spread of Data)
Range = Highest value - Lowest value.
Standard Deviation (σ or s) = Average difference from the mean.
Empirical Rule (68-95-99.7 Rule)
For symmetric (normal) distributions:
68% of data within 1 standard deviation.
95% of data within 2 standard deviations.
99.7% of data within 3 standard deviations.