Degrees of freedom
n - 2
Outliers
are observed data points that are far from the least squares line.
Influential points
observed data points that are far from the other observed data points in the horizontal direction. These points may have a big effect on the slope of the regression line.
p-value is less than the significance level
We reject the null hypothesis. There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero
p-value is NOT less than the significance level
DO NOT REJECT the null hypothesis. There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is NOT significantly different from zero.
Null Hypothesis
H0→ ρ = 0
Alternate Hypothesis
Ha→ ρ ≠ 0
Interpreting Null Hypothesis
The population correlation coefficient IS NOT significantly different from zero. There IS NOT a significant linear relationship (correlation) between x and y in the population.
Interpreting Alternate Hypothesis
The population correlation coefficient IS significantly DIFFERENT FROM zero. There IS A SIGNIFICANT LINEAR RELATIONSHIP (correlation) between x and y in the population.
ρ
population correlation coefficient
r
sample correlation coefficient
Conclusion for Significant
There is sufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is significantly different from zero.
Conclusion for Not Significant
"There is insufficient evidence to conclude that there is a significant linear relationship between x and y because the correlation coefficient is not significantly different from zero."
Significance of the correlation coefficient
to decide whether the linear relationship in the sample data is strong enough to use to model the relationship in the population.
Coefficient of determination
a number between 0 and 1 that measures how well a statistical model predicts an outcome
r^2 interpretation
when expressed as a percent, represents the percent of variation in the dependent (predicted) variable y that can be explained by variation in the independent (explanatory) variable x using the regression (best-fit) line.
1 - r^2 Interpretation
when expressed as a percentage, represents the percent of the variation in y that is NOT explained by variation in x using the regression line.
Positive correlation
A positive value of r means that when x increases, y tends to increase and when x decreases, y tends to decrease.
Positive correlation
A negative value of r means that when x increases, y tends to decrease and when x decreases, y tends to increase
Correlation coefficient (r)
is numerical and provides a measure of strength and direction of the linear association between the independent variable x and the dependent variable y.
Slope equation
b = r (sy / sx)
sx
= the standard deviation of the x values.
sy
= the standard deviation of the y values
Interpretation of the Slope
“The slope of the best-fit line tells us how the dependent variable (y) changes for every one unit increase in the independent (x) variable, on average.”
Least-Squares Line
You have a set of data whose scatter plot appears to "fit" a straight line
Least-squares regression line
Helps obtain a line of best fit
y hat
estimates value of y
y0 – ŷ0 = ε0
error or residual
Absolute value of a residual
measures the vertical distance between the actual value of y and the estimated value of y
ε
the Greek letter epsilon
Scatterplot Direction
High values of one variable occurring with high values of the other variable or low values of one variable occurring with low values of the other variable
Strength
Looking at how close the points are to the line
Linear regression
shows the relationship between a dependent and independent variable(s)
Scatterplot
uses dots to represent values for two different numeric variables.
y = a + bx
linear regression for two variables is based on a linear equation with one independent variable.
Independent variable
x
Dependent variable
y
Slope
b
y-intercept
a
Graph form
a straight line or linear
B > 0
slopes to the right
b = 0
horizontal line
b < 0
slopes downward to the right
Bivariate data
two variable data
Multivariate data
more than two variables