line of best fit
(r)(sy)/(sx)
y-intercept
mean of y-(slope(mean of x))
residuals
observed-predicted
conditions for linear models
random samples and independent observations
values of y variable are normally distributed around each x
straight enough condition: the scatterplot confirms that there is an approximately linear relationship 4)no influential outliers 5)do the points remain fairly consistently spread around the line (does the plot thicken)
residual plots
help visually determine if our data meets the conditions -show the x values on the x-axis and residual values on the y-axis
no violations
even scatter
funneling shape
violates same variance (does the plot thicken)
linearity
if there's a curve in the residual graph then there's a violation. Should not be able to see any patterns
Assessing fit
square r -shows how much of dependent variable is explained by independent variable
r squared values
greater than .8 is ideal
how to calculate R squared
1-((SS residuals)/(SStotal))
test of significance
is our slope a significant predictor in the population, can use a t-test on our slop estimate.
null hypothesis
slope equals 0
alternate hypothesis
slope does not equal 0
t-stat
t=(slope-0)/sb
when rejecting the null
there is a relationship between the two variables
what to include in a regression conclusion
Model
interpret y-int
state slope 4)test of significance
what are the assumptions
R squaredsl
slope value of the null
is always zero
example of slope interpretation height=52.90+1.47shoesize
as we increase by 1 shoesize the predicted height increases by 1.47 in.
example of y-int interpretation height=52.90+1.47shoesize
a 0 shoe size person is predicted to be 52.9 in. tall