AP-Stats

Collecting and Exploring Data

AP Review 2025


Individuals and Variables

  • Individuals: Objects described by a set of data. Can be people, animals, or things.
  • Variable: Any characteristic of an individual; can take different values for different individuals.

Categorical and Quantitative Variables

  • Categorical Variable: Places individuals into groups or categories.
  • Quantitative Variable: Takes numerical values where arithmetic operations (adding, averaging) make sense.

Distribution

  • Distribution Definition: Tells us the values a variable takes and how often these values occur.
  • Describing Distribution Using SOCS:
    • Spread: Lowest and highest values in the dataset.
    • Outliers: Unusual values that stand out from the pattern.
    • Center: Approximate average value, estimated.
    • Shape: Graphical representation that shows symmetry or skewness.

Describing the Shape of a Distribution

  • Symmetric Distribution: Values are evenly distributed around the mean.
  • Skewed Left: Mean is less than the median.
  • Skewed Right: Mean is greater than the median.

Describing Distributions with Numbers

The Mean (X)
  • Mean Calculation: Add all values and divide by the number of observations.
    X=xinX = \frac{\sum x_i}{n}
The Median (M)
  • Median Calculation: Midpoint of the distribution.
    • If n is odd: M is the center observation at position (n+12)\left( \frac{n + 1}{2} \right)
    • If n is even: M is the average of two center observations at positions n2\frac{n}{2} and n2+1\frac{n}{2} + 1

The Five-Number Summary

  • Components:

    1. Minimum
    2. First Quartile (Q1)
    3. Median (M)
    4. Third Quartile (Q3)
    5. Maximum
  • Quartiles Calculation:

    • Q1: The median of values below M.
    • Q3: The median of values above M.
  • IQR Calculation:
    IQR=Q3Q1\text{IQR} = Q3 - Q1


Outliers: The 1.5 x IQR Criterion

  • An observation is an outlier if:
    • Falls more than 1.5 x IQR below Q1
    • Falls more than 1.5 x IQR above Q3

Boxplot

  • A graphical representation of the five-number summary.
  • Features:
    • Central box spans the quartiles.
    • Line in the box marks the median.
    • Outliers plotted individually.
    • Lines extend from the box to the smallest and largest observations (excluding outliers).

The Standard Deviation (S or Sx)

  • Standard Deviation Definition: Average of the squares of the deviations of the observations from their mean.
  • Formula:
    S=(xxˉ)2n1S = \sqrt{\frac{\sum (x - \bar{x})^2}{n - 1}}

Scatterplots and Correlation

Explanatory and Response Variables
  • Response Variable: Measures the outcome of a study (dependent variable, y).
  • Explanatory Variable: Attempts to explain observed outcomes (independent variable, x).
Scatterplot Definition
  • Represents the relationship between two quantitative variables.
  • Axes:
    • Horizontal: Explanatory variable (x)
    • Vertical: Response variable (y)
Examining a Scatterplot
  • Characteristics:
    • Form: Linear or curved.
    • Direction: Positive or negative.
    • Strength: Weak, moderate, or strong.
  • An outlier in a scatterplot is a deviation from the overall pattern.

Association and Correlation

  • Association: Positive (both increase) or negative (one increases, other decreases).
  • Correlation (r): Measures the strength and direction of the relationship.
Facts about Correlation:
  1. The variable designation (x or y) does not affect correlation.
  2. Only valid for quantitative variables.
  3. Changing units of measurement does not affect r.
  4. Values range from -1 to +1.
  5. Correlation is sensitive to outliers.

Regression Line

  • Definition: A line that describes how a response variable changes with an explanatory variable.
  • Least-Squares Regression Line: Minimizes the sum of the squares of the vertical distances from observations to the line.
  • Equation:
    y=a+bxy = a + bx
Coefficient of Determination (r-squared)
  • Indicates the proportion of variance in y explained by x.
  • Example: If r2=0.73r^2 = 0.73, then x explains 73% of the variation in y.
Residual Plot
  • Displays residuals (actual y - predicted y) against x-values.
  • Used to validate the linear model.

Outliers and Influential Points

  • Outlier: Lies outside the overall pattern.
  • Influential Point: Affects correlation/regression significantly when removed.

Surveys and Samples

Population, Census, and Sample
  • Population: Entire group of individuals of interest.
  • Census: Data collected from every individual.
  • Sample: Subset from the population used to collect data.
Bias in Sampling
  • Convenience Sampling: Chooses easily reachable individuals; likely biased.
  • Voluntary Response Sample: Individuals choose themselves to respond, often leading to bias.
  • Simple Random Sample (SRS): Each individual has an equal chance of selection.

Choosing SRS

  • Step 1: Assign numerical labels to individuals.
  • Step 2: Use a random number table to select individuals.
Stratified and Cluster Samples
  • Stratified Random Sample: Classify populations into strata and select SRS from each.
  • Cluster Sample: Classify into clusters and perform SRS of clusters, including all individuals in selected clusters.

Experiments

Observational Study vs. Experiment
  • Observational Study: Observes without influence.
  • Experiment: Deliberately imposes treatment to measure responses.
Confounding
  • When unmeasured variables affect the response variable, creating misleading associations.
Principles of Experimental Design
  1. Comparison: Compare two or more treatments.
  2. Random Assignment: Use chance to assign treatments.
  3. Control: Keep variables constant across groups.
  4. Replication: Have enough experimental units to detect significant differences.
Statistical Significance
  • An effect is statistically significant if it is unlikely to occur by chance.

Experimental Designs

Completely Randomized Design
  • Treatments are assigned randomly without regard to other variables.
Block Design
  • Group similar experimental units (blocks); randomization occurs within each block.
Matched Pairs Design
  • Each block consists of matching pairs; treatments are assigned at random.