1/31
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Correlated variables
“move” together. If one variable increases, the other either tends to increase or decrease.
X increases as Y increases
the variables have a positive correlation
Y decreases as X increases
the variables have a negative correlation
“r”
The length or severity of correlation.
linear correlation coefficient
r = 1
perfect positive correlation
r= -1
means perfect negative correlation
r= 0
means no correlation
|r| > 0.9
correlation is strong
0.6 < |r| < 0.9
correlation is moderate
0.6 > |r|
the correlation is weak
Null hypothesis
Null hypothesis states that variables are not correlated.
Inference for correlation
Reject ρ = 0
results are large enough to be unlikely to occur by chance, assuming that the null hypothesis is tru
Statistically significant
Often used when a null hypothesis is rejected
Does not indicate strength of correlation
ρ = 0 is true
there is no tendency for Y to change as X changes
Theoretical regression model
yi = 𝛽0+𝛽1𝘹i+𝟄i
Describes where data comes from
Contains unknown parameters (𝛽0,𝛽1) and a random variation
yi
The response or dependent variable
𝛽0
Population intercept
𝛽1
population slope
𝘹i
Predictor or independent variable
𝟄i
random variation around the line
Estimated model
The equations for the line of best fit
ŷi=b0+b1xi
Is the line itself and is drawn through the data
Contains statistical estimates with no random variation
Interpreting b0
predicted value of y when x=0
Interpreting b1
The slope, prediction for y for a one unit increase in X
Residuals or errors
difference between an actual observed value of y and the predicted value of Y at an observed value of x
residuals= yi - ŷi = observed value - predicted value
Sum of squared errors/ residuals (SSE)
The sum of all the squared deviations between each data point and line of best fit
Sums of squares for residuals (SSE)
tells us how spread out the data are around the line of best fit
The greater the SSE, the more spread out the data are around the line of best fit
The line of best fit minimizes the sum of the squared residuals
The best fitting line has a small SSE
Small sample size
very “spread out” around the population LOBF
When sample sizes are large
each sample LOBF will tend to be closer to the population LOBF. Sampling distribution of LOBF will be less spread out.
Confidence interval
95% CI for 𝛽1 = b1 +/- 2(seb1)
reject the null
we’re saying that we have strong enough evidence that there is a linear relationship between x and y on the population level
Larger R2
there’s a stronger linear relationship, all the points are closer to the line of best fit.