This is mostly vocab and things to remember in wording. Practice problems are not here.
Correlation Coefficient
This is “r”
A quantitative assessment of the strength & direction of the linear relationship between bivariate, quantitative data
Pearson’s sample correlation is used most
parameter - ρ (rho)
statistic - r
Does the value of r depend on which of the two variables is labeled x?
The value of r does not depend on which of the two variables is labeled x
Weak Correlation range
-.5 to .5
No Correlation range
No range. 0 is no correlation
Strong Correlation Range
-1 to -.8 and .8 to 1
Moderate Correlation Range
-.8 to -.5 and .5 to .8
Does the value of r depend on which of the two variables is labeled x?
The value of r does not depend on which of the two variables is labeled x
Is value of r resistant or non-resistant?
The value of r is non-resistant
Outliers affect the correlation coefficient
What is the value of r a measure of?
The value of r is a measure of the extent to which x & y are linearly related
A value of r close to zero does not rule out any strong relationship between x and y.
True or False?
True
Does correlation imply causation?
Correlation does not imply causation
x – variable
the independent or explanatory variable
y- variable
the dependent or response variable
What does LSRL stand for?
Least Squares Regression Line
(LSRL)
What is the LSRL formula?
y (with a hat, as pictured)= a + bx
y (y-hat) - means the predicted y
b – is the slope
it is the amount by which y increases when x increases by 1 unit
a – is the y-intercept
it is the height of the line when x = 0
in some situations, the y-intercept has no meaning
What exactly is the LSRL?
The line that gives the best fit to the data set
The line that minimizes the sum of the squares of the deviations from the line
How do you interpret the slope?
For each unit increase in x, there is an approximate increase/decrease of b in y.
(Plug in the boldened words/letters).
How do you interpert the correlation coefficent?
There is a direction, strength, linear of association between x and y.
How does LSRL work with Extrapolation
The LSRL should not be used to predict y for values of x outside the data set.
It is unknown whether the pattern observed in the scatterplot continues outside this range.
The correlation coefficient and the LSRL are both non-resistant measures.
True or false?
True.
Formulas (null)
y(-hat) = b0 + b1x
b0 = y-hat - b1x
b1 = r(Sy/Sx)
Residual formula
Residual = y - y(-hat)
What are residuals?
Error
The vertical deviation between the observations & the LSRL
the sum of the residuals is always zero
error = observed - expected
What are Residual plots?
A scatterplot of the (x, residual) pairs.
Residuals can be graphed against other statistics besides x
Purpose is to tell if a linear association exist between the x & y variables
If no pattern exists between the points in the residual plot, then the association is linear.
If no pattern exists between the points in the residual plot, then the association is _____
If no pattern exists between the points in the residual plot, then the association is linear.
If a pattern exists between the points in the residual plot, then the association is ___.
non-linear
Are residual plots are the same no matter if plotted against x or y-hat?
Yes
Residual plots are the same no matter if plotted against x or y-hat.
Coefficient of determination
r2
gives the proportion of variation in y that can be attributed to an approximate linear relationship between x & y
remains the same no matter which variable is labeled x
Interperation of r2
Approximately r2% of the variation in y can be explained by the LSRL of x & y.
Outlier
In a regression setting, an outlier is a data point with a large residual
Influential point-
A point that influences where the LSRL is located
If removed, it will significantly change the slope of the LSRL
Which of these measures are resistant?
LSRL
Correlation coefficient
Coefficient of determination
NONE are resistant – all are affected by outliers
Computer-generated regression analysis of knee surgery data:
What is the equation of the LSRL?
Find the slope & y-intercept.
Predictor Coef Stdev T P
Constant 107.58 11.12 9.67 0.000
Age 0.8710 0.4146 2.10 0.062
s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7%
y(-hat) = 107.58 + .8710x
Computer-generated regression analysis of knee surgery data:
What are the correlation coefficient and the coefficient of determination
Predictor Coef Stdev T P
Constant 107.58 11.12 9.67 0.000
Age 0.8710 0.4146 2.10 0.062
s = 10.42 R-sq = 30.6% R-sq(adj) = 23.7%
Never use adjusted R2
Be sure to convert r2 to decimal before taking the square root!
r=.5532