AP-Stats
Collecting and Exploring Data
AP Review 2025
Individuals and Variables
- Individuals: Objects described by a set of data. Can be people, animals, or things.
- Variable: Any characteristic of an individual; can take different values for different individuals.
Categorical and Quantitative Variables
- Categorical Variable: Places individuals into groups or categories.
- Quantitative Variable: Takes numerical values where arithmetic operations (adding, averaging) make sense.
Distribution
- Distribution Definition: Tells us the values a variable takes and how often these values occur.
- Describing Distribution Using SOCS:
- Spread: Lowest and highest values in the dataset.
- Outliers: Unusual values that stand out from the pattern.
- Center: Approximate average value, estimated.
- Shape: Graphical representation that shows symmetry or skewness.
Describing the Shape of a Distribution
- Symmetric Distribution: Values are evenly distributed around the mean.
- Skewed Left: Mean is less than the median.
- Skewed Right: Mean is greater than the median.
Describing Distributions with Numbers
The Mean (X)
- Mean Calculation: Add all values and divide by the number of observations.
The Median (M)
- Median Calculation: Midpoint of the distribution.
- If n is odd: M is the center observation at position
- If n is even: M is the average of two center observations at positions and
The Five-Number Summary
Components:
- Minimum
- First Quartile (Q1)
- Median (M)
- Third Quartile (Q3)
- Maximum
Quartiles Calculation:
- Q1: The median of values below M.
- Q3: The median of values above M.
IQR Calculation:
Outliers: The 1.5 x IQR Criterion
- An observation is an outlier if:
- Falls more than 1.5 x IQR below Q1
- Falls more than 1.5 x IQR above Q3
Boxplot
- A graphical representation of the five-number summary.
- Features:
- Central box spans the quartiles.
- Line in the box marks the median.
- Outliers plotted individually.
- Lines extend from the box to the smallest and largest observations (excluding outliers).
The Standard Deviation (S or Sx)
- Standard Deviation Definition: Average of the squares of the deviations of the observations from their mean.
- Formula:
Scatterplots and Correlation
Explanatory and Response Variables
- Response Variable: Measures the outcome of a study (dependent variable, y).
- Explanatory Variable: Attempts to explain observed outcomes (independent variable, x).
Scatterplot Definition
- Represents the relationship between two quantitative variables.
- Axes:
- Horizontal: Explanatory variable (x)
- Vertical: Response variable (y)
Examining a Scatterplot
- Characteristics:
- Form: Linear or curved.
- Direction: Positive or negative.
- Strength: Weak, moderate, or strong.
- An outlier in a scatterplot is a deviation from the overall pattern.
Association and Correlation
- Association: Positive (both increase) or negative (one increases, other decreases).
- Correlation (r): Measures the strength and direction of the relationship.
Facts about Correlation:
- The variable designation (x or y) does not affect correlation.
- Only valid for quantitative variables.
- Changing units of measurement does not affect r.
- Values range from -1 to +1.
- Correlation is sensitive to outliers.
Regression Line
- Definition: A line that describes how a response variable changes with an explanatory variable.
- Least-Squares Regression Line: Minimizes the sum of the squares of the vertical distances from observations to the line.
- Equation:
Coefficient of Determination (r-squared)
- Indicates the proportion of variance in y explained by x.
- Example: If , then x explains 73% of the variation in y.
Residual Plot
- Displays residuals (actual y - predicted y) against x-values.
- Used to validate the linear model.
Outliers and Influential Points
- Outlier: Lies outside the overall pattern.
- Influential Point: Affects correlation/regression significantly when removed.
Surveys and Samples
Population, Census, and Sample
- Population: Entire group of individuals of interest.
- Census: Data collected from every individual.
- Sample: Subset from the population used to collect data.
Bias in Sampling
- Convenience Sampling: Chooses easily reachable individuals; likely biased.
- Voluntary Response Sample: Individuals choose themselves to respond, often leading to bias.
- Simple Random Sample (SRS): Each individual has an equal chance of selection.
Choosing SRS
- Step 1: Assign numerical labels to individuals.
- Step 2: Use a random number table to select individuals.
Stratified and Cluster Samples
- Stratified Random Sample: Classify populations into strata and select SRS from each.
- Cluster Sample: Classify into clusters and perform SRS of clusters, including all individuals in selected clusters.
Experiments
Observational Study vs. Experiment
- Observational Study: Observes without influence.
- Experiment: Deliberately imposes treatment to measure responses.
Confounding
- When unmeasured variables affect the response variable, creating misleading associations.
Principles of Experimental Design
- Comparison: Compare two or more treatments.
- Random Assignment: Use chance to assign treatments.
- Control: Keep variables constant across groups.
- Replication: Have enough experimental units to detect significant differences.
Statistical Significance
- An effect is statistically significant if it is unlikely to occur by chance.
Experimental Designs
Completely Randomized Design
- Treatments are assigned randomly without regard to other variables.
Block Design
- Group similar experimental units (blocks); randomization occurs within each block.
Matched Pairs Design
- Each block consists of matching pairs; treatments are assigned at random.