Least-Squares Line: You have a set of data whose scatter plot appears to "fit" a straight line
Least-squares regression line: Helps obtain a line of best fit
y hat: estimates value of y
y0 – ŷ0 = ε0: error or residual
Absolute value of a residual: measures the vertical distance between the actual value of y and the estimated value of y
ε: the Greek letter epsilon
Slope equation: b = r (sy / sx)
Interpretation of the Slope: “The slope of the best-fit line tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average.”
Using the Linear Regression T Test
Correlation coefficient (r): is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the strength of the linear relationship between x and y. Values of r close to –1 or to +1 indicate a stronger linear relationship between x and y.
If r = 0 there is likely no linear correlation. It is important to view the scatterplot, however, because data that exhibit a curved or horizontal pattern may have a correlation of 0.
If r = 1, there is perfect positive correlation. If r = –1, there is perfect negative correlation. In both these cases, all of the original data points lie on a straight line.
Positive correlation: A positive value of r means that when x increases, y tends to increase and when x decreases, y tends to decrease.
Positive correlation: A negative value of r means that when x increases, y tends to decrease and when x decreases, y tends to increase
Correlation does not imply causation
0 < r < 1: A scatter plot showing data with a positive correlation.
–1 < r < 0: A scatter plot showing data with a negative correlation.
r = 0: A scatter plot showing data with zero correlation.
Coefficient of determination: a number between 0 and 1 that measures how well a statistical model predicts an outcome
r^2 interpretation: when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.
1 - r^2 Interpretation: when expressed as a percentage, represents the percent of the variation in y that is NOT explained by variation in x using the regression line.
Outliers: are observed data points that are far from the least squares line.
Influential points: observed data points that are far from the other observed data points in the horizontal direction. These points may have a big effect on the slope of the regression line.
Degrees of freedom: n - 2
\