Least-Squares Line: You have a set of data whose scatter plot appears to "fit" a straight line
Least-squares regression line: Helps obtain a line of best fit
y hat: estimates value of y
y0 – ŷ0 = ε0: error or residual
Absolute value of a residual: measures the vertical distance between the actual value of y and the estimated value of y
ε: the Greek letter epsilon
Slope equation: b = r (sy / sx)
sx = the standard deviation of the x values.
sy = the standard deviation of the y values
Interpretation of the Slope: “The slope of the best-fit line tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average.”
Using the Linear Regression T Test
In the STAT list editor, enter the X data in list L1 and the Y data in list L2, paired so that the corresponding (x,y) values are next to each other in the lists.
On the STAT TESTS menu, scroll down with the cursor to select the LinRegTTest.
On the LinRegTTest input screen enter: Xlist: L1 ; Ylist: L2 ; Freq: 1
On the next line, at the prompt β or ρ, highlight "≠ 0" and press ENTER
Leave the line for "RegEq:" blank
Highlight Calculate and press ENTER.
Correlation coefficient (r): is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
The value of r is always between –1 and +1: –1 ≤ r ≤ 1.
The size of the correlation r indicates the strength of the linear relationship between x and y. Values of r close to –1 or to +1 indicate a stronger linear relationship between x and y.
If r = 0 there is likely no linear correlation. It is important to view the scatterplot, however, because data that exhibit a curved or horizontal pattern may have a correlation of 0.
If r = 1, there is perfect positive correlation. If r = –1, there is perfect negative correlation. In both these cases, all of the original data points lie on a straight line.
Positive correlation: A positive value of r means that when x increases, y tends to increase and when x decreases, y tends to decrease.
Positive correlation: A negative value of r means that when x increases, y tends to decrease and when x decreases, y tends to increase
Correlation does not imply causation
0 < r < 1: A scatter plot showing data with a positive correlation.
–1 < r < 0: A scatter plot showing data with a negative correlation.
r = 0: A scatter plot showing data with zero correlation.
Coefficient of determination: a number between 0 and 1 that measures how well a statistical model predicts an outcome
r^2 interpretation: when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.
1 - r^2 Interpretation: when expressed as a percentage, represents the percent of the variation in y that is NOT explained by variation in x using the regression line.
Outliers: are observed data points that are far from the least squares line.
Influential points: observed data points that are far from the other observed data points in the horizontal direction. These points may have a big effect on the slope of the regression line.
Degrees of freedom: n - 2
\