AP Statistics
Understanding Statistics
Introduction to Statistics
Statistics serves to help answer important real-world questions based on variable data.
Key questions to consider in statistical analysis:
How do we identify the question to be answered or problem in a given context?
How can statistics provide insights?
Case Study - Flint Michigan Water Crisis
Location: Flint, Michigan
Date: April 2014
Reason for Crisis: Switching the water supply to save money.
Impact on Residents:
Complaints about water quality (looks, smell, taste).
Health issues reported such as rashes, hair loss, itchy skin.
Conclusion: Data analysis revealed the water was unsafe to drink despite claims from officials.
Understanding Data
Variables
Individuals: Refers to people, animals, or things described by the data.
Examples include ID numbers or survey participants.
Variables: Characteristics that can change from one individual to another.
Types of variables:
Categorical Variables: Non-numerical values that represent categories.
Examples: Zip codes, grade levels.
Quantitative Variables: Numerical values representing counted or measured quantities.
Importance of including units of measurement.
Classifying Variables
Categorical Data: Values of a categorical variable in a dataset.
Quantitative Data: Values of a quantitative variable.
Organizing Categorical Data
Categorical Tables
Frequency Table: Shows the number of individuals in each category.
Relative Frequency Table: Shows the percentage of individuals in each category.
Importance: Categorical data can be presented in graphical forms like bar graphs and pie charts.
Creating Bar Graphs
Labels:
Axes (X-axis: Categories, Y-axis: Frequency)
Equally spaced bars.
Height represents frequency.
Visual Example:
Pie Charts: Used for categorical data with a legend for clarity.
Visual Example:
Quantitative Data
Types of Quantitative Variables
Discrete Variables: Countable number of values (e.g., number of siblings).
Continuous Variables: Can take on infinite values within a range (e.g., height).
Graphs for Quantitative Data
Dot Plots: Show individual values and distribution.
Visual Example:

Stem and Leaf Plots: Similar benefits as dot plots but can be cumbersome for larger datasets.
Visual Example:

Histograms: Easier for larger datasets; show the shape of the distribution but do not display individual values.
Visual Example:

Describing Data Distribution
Shape:
Symmetric, Skewed (left/right), Unimodal, Bimodal, Uniform.
Center: Most indicative value of the dataset.
Variability: Spread of the data; can be assessed through range and interquartile range (IQR).
Unusual Features: Outliers and their impact on mean and standard deviation.
Statistical Summary
Measures of Central Tendency
Mean: Average value = (sum of values) / (number of values).
Median: Middle value of an ordered set.
Quartiles:
Q1: Median of the first half.
Q3: Median of the second half.
Variability:
Range: Difference between the max and min values.
Standard Deviation: Measures how spread out values are from the mean.
Outlier Detection
Methods to Identify Outliers
Value more than 1.5 times IQR outside Q1 or Q3.
Value that lies more than 2 standard deviations from the mean.
Impact of Outliers on Statistics
Outliers can skew summary statistics, with effects differing between resistant and non-resistant measures:
Resistant: Median, IQR
Non-resistant: Mean, standard deviation, range
Comparing Distributions
Characteristics to analyze: Shape, Center, Variability, Unusual Features.
Contextual comparisons help in understanding data differences and implications.
Understanding Normal Distribution
A key model for understanding quantitative data distribution, appears as a bell curve.
Empirical Rule: 68% of values within 1 SD, 95% within 2 SD, 99.7% within 3 SD from the mean.
Exploring 2-Variable Data
Related Variables
Categorical and quantitative data can show relationships through graphical representations such as bar graphs and scatter plots.
Correlation Coefficient (r): Measures strength and direction of a linear relationship between two variables.
Values range from -1 (perfect negative) to 1 (perfect positive).
Causation vs. Correlation: High correlation does not imply causation due to other influencing factors.
Regression Analysis
Linear Regression Model: Predicts the response variable based on the explanatory variable; represented with the equation ŷ = a + bx.
Residuals: Measure prediction accuracy; analyzed through residual plots for model fit.
Coefficient of Determination: Indicates percentage of variation in response variable explained by the explanatory variable.
Data Collection Considerations
Importance of proper sampling techniques to ensure representativeness:
Random Sampling vs. Non-Random Sampling.
Be aware of confounding factors that can impact study conclusions.