Linear Regression:
Regression line→ line that best fits both variables
Known as line of best fit
Only deals with linear relationships
The model is a straight line and from that model we can predict the other variable
Formula→ y=bx+a
B is slope
A is intercept
Linear Regression Coefficients:
Formula→ ŷ=bx+a
The goal is to determine values of a and b that best fit the data
We use those values to predict the values of y given x
Example:
ŷ=20(x)+1000
Every time that alcohol consumption increases by 1 oz reaction time increases by 20 ms
The “baseline” reaction times without alcohol are around 1000ms (a=1000)
Then given a dosage level we can predict ones reaction time on the task
Regression and Error:
Predictions will always fall on the regression line→ most data will lie beside
The difference between a data point and the corresponding prediction is known as the prediction error/residual
Linear Regression coefficients:
Formula:
ŷ=bx+a→ min Σ(y-ŷ)²
Hypothesis Testing:
Is a regression significant?
Is x a good predictor of y?
Is b significantly different from 0?
How do we test?
A regression is significant if the corresponding correlation is significant
Degrees of Freedom:
We use the two sample means to calculate everything else
Use N-2
Those free data points are reported so that it is clear how many observations we have used to draw conclusions
DF will always be less than n
Used to determine p values and critical values
Standard Error estimate:
Formula→ S = S (1-r²) (N-1)
Y-ŷ y ------------
(N-2)
The proportion of variance in Y we can account for with X equals r²
Constraints:
Regression is sensitive to outliers
Only appropriate for linear relationships
Regression is sensitive to restriction of range
Beware of heterogeneous samples
Regression does not allow for us to make predictions about causation
Correlation:
Correlation→ degree to which two variables are related
Positive→ as X increases, Y also increases
Negative→ as X increases, Y decreases
Linear Relationships→ represented by a straight lines
Shows there is a common relationship between both variables
Covariance→ degree to which two variables vary together
Formula→ (x-x)(y-Y)
—-------------------
N-1
Correlation Coefficient→ a statistic that measures the relationship between two variables
Range from -1 to -1
Shows type of relationship and its strength
Pearson's product moment correlation→ r
Spearman's rank order correlation→ rs
Point-biserial correlation→ rpb