AP Stats Unit 3

Vocab

Bivariate Data - Data with 2 variables

Explanatory Variables - The independent variable or x-axis (predicts, explains, or influences a trend).

Response Variable - The dependent variable or y-axis, the measured outcome that is a response to the trend.

Positive Correlation - As the x value increases, the y value also increase.

Negative Correlation - As the x value increases, the y value decreases.

Line of best fit - A straight line on a scatterplot that best represents the trend of data points.

Weak Correlation - Data is far from the line of best fit.

Strong Correlation - Data is close to the line of best fit.

Correlation Coefficient - How close the data is to the line of best fit (Strength).

Residual is the Distance from data point to line of best fit.

Least Squares Regression Line (LSRL) - A linear model that minimizes the sum of the square residuals between the data and the model.

Standard Deviation of the Residuals (s) - Typical error between data points and their LSRL.

Influential Points - Points that, if removed, change the slope, y-intercept, or correlation substantially.

Notes

Scatterplot:

  • Two quantitative variables are visualized in a scatterplot

CDOFS:

Context - “The relationship between A and B appears to be…”

Direction - Positive or Negative.

Outliers - Unusual data points.

Form - linear or non-linear

Strength - Weak, moderate or strong correlation

Correlation Coefficient:

  • r close to 0 → weak correlation

  • r close to -1 or 1 → strong correlation

  • Negative r value → negative correlation

  • Positive r value → positive correlation

Residual

  • Observed y - predicted y

Linear Model a Good Fit?:

  • Residuals randomly scattered around without pattern → Good fit.

  • Residuals show a curved pattern → Not a good fit.

  • More variation in residuals as x values increases → Not a good fit

Standard Deviation of the Residuals (s):

  • Shorter s = Stronger correlation

  • Larger s = Weaker correlation

The Coefficient of Determination (r²):

  • Gets rid of the negative

  • Emphasizes the differences of strength

  • r² = 1.00 = 100% → The linear model completely explains the data’s pattern.

  • r² = 0.72 = 72% → The linear model explains some of the data’s pattern, but not all of it. There is some error.

How Outliers affect r, r², and s:

  • r becomes smaller.

  • r² becomes smaller.

  • s becomes bigger.

Leverage and Influential Points:

  • Every LSRL will always go through the point (x mean, y mean)

  • Low leverage points are close to the x mean

  • High leverage points are far from the x mean

  • Low leverage points don’t have as much impact (leverage) on the slope compared to High leverage points which have a lot of impact (leverage) on the slope.

Influential Points:

  • Outliers (change the correlation or r).

  • High Leverage (change the slope and y-intercept).

  • Both, an influential point can be both an outlier and a high leverage point.

Reading Computer Regression Tables:

  • The constant is the y-intercept.

  • The other thing will be the slope

  • Example: Constant -14.7, Income 0.001 = y(predicted) =-14.7 + 0.001x

  • The s and r² are also included, make sure to use the R-Sq for r² not R-Sq(adj).

Algebra Flashback:

  • log_10 a = b → 10^b = a

  • ln_e a = b → e^b = a

Interpret Stems:

Interpret Slope Value:

  • For every 1 unit increase in explanatory variable, our model predicts an average increase/decrease of slope in response variable.

Interpret Y-Intercept:

  • When the explanatory variable is zero units, our model predicts that the response variable would be y-intercept.

Interpret S (Typical Residual Length):

  • When using the LSRL with explanatory variable to predict response variable, we will typically be off by about value of S with units of the response variable.

Interpreting r² (Coefficient of Determination):

  • % of the variation in response variable can be explained by the linear relationship with explanatory variable.

Interpret the Model’s Residual (Prediction Error):

  • The actual response variable is greater/less than predicted by residual units.