Intercept (a)
starting value in y-units. The y-value when x is zero
slope(b)
For every 1 (x unit) increase in (x variable) there is a (slope) (y unit) increase in mean (y variable). slope = SD of x/SD of y
Correlation Coefficient
There appears to be a (weak/moderate/strong) (positive/negative) (linear/nonlinear) relationship between (x variable) and (y variable)
Coefficient of Determination
About R²% of the variability in (y variable) can be explained by variability in (x variable)
Standard Error of residuals
on average, the actual (y variable) values vary about (standard error of slope, sb/SEb with units) from the predicted values (find using LinRegTTest and select sign from alt. hypothesis)
Residual = Actual - preducted value (how far vertically from line of best fit)
pos: underestimated
neg: overestimated
Explanatory variable
cause, independent, x-axis
Response variable
effect, dependent, y-axis
Association (any form)
Direction: positive/negative, Form: straight/curved Strength: weak/moderate/strong or combo
Correlation
cannot be greater than one. If given r², square root r²
Outliers
can either have large residual or high leverage
Leverage
high leverage if x value is far from mean of x-values, works like a lever if it’s influential
Quantitative variables condition
both variables are quantitative
Straight enough condition
scatter plot looks reasonably straight
Outliers condition
outliers either arent obvious or have a large enough sample to proceed with caution with
Correlation of 0
no linear association
Correlation
measures strength of linear association between two variables, which can be strongly associatied but still have small correlation if said association isnt linear
Linear model
y = a+b(x)
Residual
observed-predicted
Turn scatter plot on
stat diagnostic on in mode, stat edit, L1 = X, L2 = Y, 2nd y= on, window 9, graph
Get linear model on calc
stat-calc-8, store regEq, vars-y-vars-function-y1
residuals
use l3 to 2nd-stat-resid
Outliers
horizontal outliers (leverage) more influential than vertical outliers (residuals)
A residual scatter plot with a cluster and one “stray point:”
The point has high/low leverage and a large/small residual. this point is/isn’t influential/ If the point were removed the correlation would become weaker/stronger, and removing it would strengthen/weaken the association. The slope would increase/decrease/remain the same, since the point is/isn’t influential.
Null hypothesis
Ho: There is no linear relationship between —- and —-. (B = 0.)
Alternative hypothesis
Ha: there is a linear relationship between —— and ——. (B doesnt equal 0)
Assumptions for inference. IN ORDER
Straight enough, Independence, Spread, Nearly Normal (SEISNN, Sally Eats Icees Stealthily Nearing Normandy)
Straight Enough
Scatter plot of data points is straight enough to try a linear model
Independence
residual plot is scattered
Spread
spread of residuals is consistent
Nearly Normal condition
histogram of residuals is unimodal and symmetric. If possible outlier: with one possible outlier, with the large sample size however, it should be okay to proceed
After conditions
since the conditions for inference have been met, the sampling distribution of the regression slope can be modeled by a Student’s t-model with — degrees of freedom. We’ll use a regression slope t-test. The equation of the line of best fit of these data points is y = a+bx where —- are measured in — units.
P-value is less than alpha
the value of t = ____. The P-value of less than alpha means that the association we see in the data is unlikly to occur by chance. Since our P-value is below our signifcance level of —, we reject the null hypothesis and conclude there is strong evidence of a linear relationship between —— and —-. As —— increases, —— (increases/decreases)
P value is greater than alpha
the value of t = ____. The P-value of greater than alpha means that the association we see in the data is likely to occur by chance. Since our P-value is above our significance level of —-, we fail to reject the null hypothesis and conclude theres weak evidence of a linear relationship between —- and —-.
conf interval
a GIVEN PERCENT confidence regression slope t-interval: ind. coeficcient +- (invT(conf level, Dof (remember it’s -2!)(SE coefficient of independent variable) equals about (—-,—--)i
interpret confidence interval
we are GIVEN PERCENT confident that the mean increase/decrease is in an interval between about —- and about —-