AP Statistics

Statistics serves to help answer important real-world questions based on variable data.
Key questions to consider in statistical analysis:
- How do we identify the question to be answered or problem in a given context?
- How can statistics provide insights?

Location: Flint, Michigan
Date: April 2014
Reason for Crisis: Switching the water supply to save money.
Impact on Residents:
- Complaints about water quality (looks, smell, taste).
- Health issues reported such as rashes, hair loss, itchy skin.
Conclusion: Data analysis revealed the water was unsafe to drink despite claims from officials.

Variables

Individuals: Refers to people, animals, or things described by the data.
- Examples include ID numbers or survey participants.
Variables: Characteristics that can change from one individual to another.
- Types of variables:
  - Categorical Variables: Non-numerical values that represent categories.
    - Examples: Zip codes, grade levels.
  - Quantitative Variables: Numerical values representing counted or measured quantities.
    - Importance of including units of measurement.

Classifying Variables

Categorical Tables

Frequency Table: Shows the number of individuals in each category.
Relative Frequency Table: Shows the percentage of individuals in each category.
Importance: Categorical data can be presented in graphical forms like bar graphs and pie charts.

Creating Bar Graphs

Labels:
- Axes (X-axis: Categories, Y-axis: Frequency)
- Equally spaced bars.
- Height represents frequency.
Visual Example:
Pie Charts: Used for categorical data with a legend for clarity.
Visual Example:

Types of Quantitative Variables

Discrete Variables: Countable number of values (e.g., number of siblings).
Continuous Variables: Can take on infinite values within a range (e.g., height).

Graphs for Quantitative Data

Dot Plots: Show individual values and distribution.
- Visual Example:
Stem and Leaf Plots: Similar benefits as dot plots but can be cumbersome for larger datasets.
- Visual Example:
Histograms: Easier for larger datasets; show the shape of the distribution but do not display individual values.
- Visual Example:

Describing Data Distribution

Shape:
- Symmetric, Skewed (left/right), Unimodal, Bimodal, Uniform.
Center: Most indicative value of the dataset.
Variability: Spread of the data; can be assessed through range and interquartile range (IQR).
Unusual Features: Outliers and their impact on mean and standard deviation.

Measures of Central Tendency

Mean: Average value = (sum of values) / (number of values).
Median: Middle value of an ordered set.
Quartiles:
- Q1: Median of the first half.
- Q3: Median of the second half.
Variability:
- Range: Difference between the max and min values.
- Standard Deviation: Measures how spread out values are from the mean.

Methods to Identify Outliers

Impact of Outliers on Statistics

Outliers can skew summary statistics, with effects differing between resistant and non-resistant measures:
- Resistant: Median, IQR
- Non-resistant: Mean, standard deviation, range

A key model for understanding quantitative data distribution, appears as a bell curve.
Empirical Rule: 68% of values within 1 SD, 95% within 2 SD, 99.7% within 3 SD from the mean.

Related Variables

Categorical and quantitative data can show relationships through graphical representations such as bar graphs and scatter plots.
Correlation Coefficient (r): Measures strength and direction of a linear relationship between two variables.
- Values range from -1 (perfect negative) to 1 (perfect positive).
Causation vs. Correlation: High correlation does not imply causation due to other influencing factors.

Regression Analysis

Linear Regression Model: Predicts the response variable based on the explanatory variable; represented with the equation ŷ = a + bx.
Residuals: Measure prediction accuracy; analyzed through residual plots for model fit.
Coefficient of Determination: Indicates percentage of variation in response variable explained by the explanatory variable.

Importance of proper sampling techniques to ensure representativeness:
- Random Sampling vs. Non-Random Sampling.
- Be aware of confounding factors that can impact study conclusions.