AP Stats Unit 3

Vocab

Bivariate Data - Data with 2 variables

Explanatory Variables - The independent variable or x-axis (predicts, explains, or influences a trend).

Response Variable - The dependent variable or y-axis, the measured outcome that is a response to the trend.

Positive Correlation - As the x value increases, the y value also increase.

Negative Correlation - As the x value increases, the y value decreases.

Line of best fit - A straight line on a scatterplot that best represents the trend of data points.

Weak Correlation - Data is far from the line of best fit.

Strong Correlation - Data is close to the line of best fit.

Correlation Coefficient - How close the data is to the line of best fit (Strength).

Residual is the Distance from data point to line of best fit.

Least Squares Regression Line (LSRL) - A linear model that minimizes the sum of the square residuals between the data and the model.

Standard Deviation of the Residuals (s) - Typical error between data points and their LSRL.

Influential Points - Points that, if removed, change the slope, y-intercept, or correlation substantially.

Notes

Scatterplot:

Two quantitative variables are visualized in a scatterplot

CDOFS:

Context - “The relationship between A and B appears to be…”

Direction - Positive or Negative.

Outliers - Unusual data points.

Form - linear or non-linear

Strength - Weak, moderate or strong correlation

Correlation Coefficient:

r close to 0 → weak correlation
r close to -1 or 1 → strong correlation
Negative r value → negative correlation
Positive r value → positive correlation

Residual

Observed y - predicted y

Linear Model a Good Fit?:

Residuals randomly scattered around without pattern → Good fit.
Residuals show a curved pattern → Not a good fit.
More variation in residuals as x values increases → Not a good fit

Standard Deviation of the Residuals (s):

Shorter s = Stronger correlation
Larger s = Weaker correlation

The Coefficient of Determination (r²):

Gets rid of the negative
Emphasizes the differences of strength
r² = 1.00 = 100% → The linear model completely explains the data’s pattern.
r² = 0.72 = 72% → The linear model explains some of the data’s pattern, but not all of it. There is some error.

How Outliers affect r, r², and s:

r becomes smaller.
r² becomes smaller.
s becomes bigger.

Leverage and Influential Points:

Every LSRL will always go through the point (x mean, y mean)
Low leverage points are close to the x mean
High leverage points are far from the x mean
Low leverage points don’t have as much impact (leverage) on the slope compared to High leverage points which have a lot of impact (leverage) on the slope.

Influential Points:

Outliers (change the correlation or r).
High Leverage (change the slope and y-intercept).
Both, an influential point can be both an outlier and a high leverage point.

Reading Computer Regression Tables:

The constant is the y-intercept.
The other thing will be the slope
Example: Constant -14.7, Income 0.001 = y(predicted) =-14.7 + 0.001x
The s and r² are also included, make sure to use the R-Sq for r² not R-Sq(adj).

Algebra Flashback:

log_10 a = b → 10^b = a
ln_e a = b → e^b = a

Interpret Stems:

Interpret Slope Value:

For every 1 unit increase in explanatory variable, our model predicts an average increase/decrease of slope in response variable.

Interpret Y-Intercept:

When the explanatory variable is zero units, our model predicts that the response variable would be y-intercept.

Interpret S (Typical Residual Length):

When using the LSRL with explanatory variable to predict response variable, we will typically be off by about value of S with units of the response variable.

Interpreting r² (Coefficient of Determination):

r²% of the variation in response variable can be explained by the linear relationship with explanatory variable.

Interpret the Model’s Residual (Prediction Error):

The actual response variable is greater/less than predicted by residual units.