AP Stats Unit 3
Vocab
Bivariate Data - Data with 2 variables
Explanatory Variables - The independent variable or x-axis (predicts, explains, or influences a trend).
Response Variable - The dependent variable or y-axis, the measured outcome that is a response to the trend.
Positive Correlation - As the x value increases, the y value also increase.
Negative Correlation - As the x value increases, the y value decreases.
Line of best fit - A straight line on a scatterplot that best represents the trend of data points.
Weak Correlation - Data is far from the line of best fit.
Strong Correlation - Data is close to the line of best fit.
Correlation Coefficient - How close the data is to the line of best fit (Strength).
Residual is the Distance from data point to line of best fit.
Least Squares Regression Line (LSRL) - A linear model that minimizes the sum of the square residuals between the data and the model.
Standard Deviation of the Residuals (s) - Typical error between data points and their LSRL.
Influential Points - Points that, if removed, change the slope, y-intercept, or correlation substantially.
Notes
Scatterplot:
Two quantitative variables are visualized in a scatterplot
CDOFS:
Context - “The relationship between A and B appears to be…”
Direction - Positive or Negative.
Outliers - Unusual data points.
Form - linear or non-linear
Strength - Weak, moderate or strong correlation
Correlation Coefficient:
r close to 0 → weak correlation
r close to -1 or 1 → strong correlation
Negative r value → negative correlation
Positive r value → positive correlation
Residual
Observed y - predicted y
Linear Model a Good Fit?:
Residuals randomly scattered around without pattern → Good fit.
Residuals show a curved pattern → Not a good fit.
More variation in residuals as x values increases → Not a good fit
Standard Deviation of the Residuals (s):
Shorter s = Stronger correlation
Larger s = Weaker correlation
The Coefficient of Determination (r²):
Gets rid of the negative
Emphasizes the differences of strength
r² = 1.00 = 100% → The linear model completely explains the data’s pattern.
r² = 0.72 = 72% → The linear model explains some of the data’s pattern, but not all of it. There is some error.
How Outliers affect r, r², and s:
r becomes smaller.
r² becomes smaller.
s becomes bigger.
Leverage and Influential Points:
Every LSRL will always go through the point (x mean, y mean)
Low leverage points are close to the x mean
High leverage points are far from the x mean
Low leverage points don’t have as much impact (leverage) on the slope compared to High leverage points which have a lot of impact (leverage) on the slope.
Influential Points:
Outliers (change the correlation or r).
High Leverage (change the slope and y-intercept).
Both, an influential point can be both an outlier and a high leverage point.
Reading Computer Regression Tables:
The constant is the y-intercept.
The other thing will be the slope
Example: Constant -14.7, Income 0.001 = y(predicted) =-14.7 + 0.001x
The s and r² are also included, make sure to use the R-Sq for r² not R-Sq(adj).
Algebra Flashback:
log_10 a = b → 10^b = a
ln_e a = b → e^b = a
Interpret Stems:
Interpret Slope Value:
For every 1 unit increase in explanatory variable, our model predicts an average increase/decrease of slope in response variable.
Interpret Y-Intercept:
When the explanatory variable is zero units, our model predicts that the response variable would be y-intercept.
Interpret S (Typical Residual Length):
When using the LSRL with explanatory variable to predict response variable, we will typically be off by about value of S with units of the response variable.
Interpreting r² (Coefficient of Determination):
r²% of the variation in response variable can be explained by the linear relationship with explanatory variable.
Interpret the Model’s Residual (Prediction Error):
The actual response variable is greater/less than predicted by residual units.