AP Statistics

Understanding Statistics
Introduction to Statistics
  • Statistics serves to help answer important real-world questions based on variable data.

  • Key questions to consider in statistical analysis:

    • How do we identify the question to be answered or problem in a given context?

    • How can statistics provide insights?

Case Study - Flint Michigan Water Crisis
  • Location: Flint, Michigan

  • Date: April 2014

  • Reason for Crisis: Switching the water supply to save money.

  • Impact on Residents:

    • Complaints about water quality (looks, smell, taste).

    • Health issues reported such as rashes, hair loss, itchy skin.

  • Conclusion: Data analysis revealed the water was unsafe to drink despite claims from officials.

Understanding Data

Variables

  • Individuals: Refers to people, animals, or things described by the data.

    • Examples include ID numbers or survey participants.

  • Variables: Characteristics that can change from one individual to another.

    • Types of variables:

      • Categorical Variables: Non-numerical values that represent categories.

        • Examples: Zip codes, grade levels.

      • Quantitative Variables: Numerical values representing counted or measured quantities.

        • Importance of including units of measurement.

Classifying Variables

  • Categorical Data: Values of a categorical variable in a dataset.

  • Quantitative Data: Values of a quantitative variable.

Organizing Categorical Data

Categorical Tables

  • Frequency Table: Shows the number of individuals in each category.

  • Relative Frequency Table: Shows the percentage of individuals in each category.

  • Importance: Categorical data can be presented in graphical forms like bar graphs and pie charts.

Creating Bar Graphs

  • Labels:

    • Axes (X-axis: Categories, Y-axis: Frequency)

    • Equally spaced bars.

    • Height represents frequency.

  • Visual Example:

  • Pie Charts: Used for categorical data with a legend for clarity.

  • Visual Example:

Quantitative Data

Types of Quantitative Variables

  • Discrete Variables: Countable number of values (e.g., number of siblings).

  • Continuous Variables: Can take on infinite values within a range (e.g., height).

Graphs for Quantitative Data

  • Dot Plots: Show individual values and distribution.

    • Visual Example:

      Dot Plot Example
  • Stem and Leaf Plots: Similar benefits as dot plots but can be cumbersome for larger datasets.

    • Visual Example:

      Stem and Leaf Plot Example
  • Histograms: Easier for larger datasets; show the shape of the distribution but do not display individual values.

    • Visual Example:

      Histogram Example

Describing Data Distribution

  • Shape:

    • Symmetric, Skewed (left/right), Unimodal, Bimodal, Uniform.

  • Center: Most indicative value of the dataset.

  • Variability: Spread of the data; can be assessed through range and interquartile range (IQR).

  • Unusual Features: Outliers and their impact on mean and standard deviation.

Statistical Summary

Measures of Central Tendency

  • Mean: Average value = (sum of values) / (number of values).

  • Median: Middle value of an ordered set.

  • Quartiles:

    • Q1: Median of the first half.

    • Q3: Median of the second half.

  • Variability:

    • Range: Difference between the max and min values.

    • Standard Deviation: Measures how spread out values are from the mean.

Outlier Detection

Methods to Identify Outliers

  1. Value more than 1.5 times IQR outside Q1 or Q3.

  2. Value that lies more than 2 standard deviations from the mean.

Impact of Outliers on Statistics

  • Outliers can skew summary statistics, with effects differing between resistant and non-resistant measures:

    • Resistant: Median, IQR

    • Non-resistant: Mean, standard deviation, range

Comparing Distributions
  • Characteristics to analyze: Shape, Center, Variability, Unusual Features.

  • Contextual comparisons help in understanding data differences and implications.

Understanding Normal Distribution
  • A key model for understanding quantitative data distribution, appears as a bell curve.

  • Empirical Rule: 68% of values within 1 SD, 95% within 2 SD, 99.7% within 3 SD from the mean.

Exploring 2-Variable Data

Related Variables

  • Categorical and quantitative data can show relationships through graphical representations such as bar graphs and scatter plots.

  • Correlation Coefficient (r): Measures strength and direction of a linear relationship between two variables.

    • Values range from -1 (perfect negative) to 1 (perfect positive).

  • Causation vs. Correlation: High correlation does not imply causation due to other influencing factors.

Regression Analysis

  • Linear Regression Model: Predicts the response variable based on the explanatory variable; represented with the equation ŷ = a + bx.

  • Residuals: Measure prediction accuracy; analyzed through residual plots for model fit.

  • Coefficient of Determination: Indicates percentage of variation in response variable explained by the explanatory variable.

Data Collection Considerations
  • Importance of proper sampling techniques to ensure representativeness:

    • Random Sampling vs. Non-Random Sampling.

    • Be aware of confounding factors that can impact study conclusions.