unit 3 AP stats
how to interpret slope: “The predicted y goes up/down by about b for each increase of 1 unit in X.”
how to interpret the y-intercept of a regression line: “The predicted is __ is about_ when __ is 0.”
How to interpret residual: “The actual__ is__ more/less than the value predicted by the regression line using x=__".”
residual = Actual-predicted
residual= y-ŷ
residual=prediction error of equation of the least Squared Regression Line
Describing scatterplot: Form-is a line an appropriate model? Association-positive or negative trend(to the right). Strength- correlation is useful for linear models. unusual features- clusters, points far away from others.
Correlation is not resistant to outliers because it uses mean and SD
Correlation: close to 1 or -1= strong correlation. close to 0=weak association
Correlation has no units.
general for of a regression equation: ŷ=a+bx
ŷ=predicted value from the model
y=actual observed value
extrapolation is using a model to make predictions outside the range of observed inputs.
The best-fit regression line for a set of data: minimizes the sum of the squared residuals. Added up equal residual.
to find the equation of the least squares regression line: b=r Sy/Sx. a=ybar-b(xbar)
residual plot: plots the x list vs the residual(error) for each input. helps determine if a line is the best model to use
linear model appropriate?: no pattern-linear model is good. pattern-line might not be the best fit
S is the typical prediction error
s: “Prediction made using this linear model typically vary by about S units from the actual y.”
compare sum of squared residuals: 1. sum of squared error from the line that just uses the mean. 2. sum of squared errors from the least squared regression equation line
r² measures how much better the sum of squared errors from the least squared regression equation line is than the sum of squared errors from the line that uses mean.
Interpet r²: “__% of the variability in__ is accounted for by the least squared regression line with x=__.”
r=√R²
r² and s both measures how well a line fots the data, just in different ways.