1/24
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is Regression?
Regression is used model a predictive linear relationship between a numeric independent and dependent variable.
Purpose Refers to IV as a predictor of DV
Design Typically used in non-experimental research designs
> E.G. Longitudinal studies where something at one time point is used to predict something at a different time point
> E.G. Cross-sectional survey where many factors are measured
RQ "Does X predict Y?";
Why are predictive relationships not always causal?
A successful prediction simply identifies whether a statistical association exists NOT a causal relationship/sequence.
> Simple linear regression does not control for confounds/third variables
1. “If you’re a psych student, I predict you are a woman" = NOT causal
2. "If you answer Qs in class, I predict you are high in extraversion" = NOT causal (could be reverse-causal)
3. "If you study lots, I predict you’ll do well in the final" = MAYBE causal BUT many confounds
Correlation vs. Regression
Correlation is a simplified version of regression!
1. Relationship Model
> Correlation = Linear; Regression = Predictive linear
2. Variables
> BOTH use numeric IV & DV (caveat point biserial correlations/dichotomous predictors)
3. Research Application
> BOTH typically used in non-exp design
Properties of the Regression Line
AKA (Straight) line of best fit
Equation Regression model produces equation of the line
(Y-hat = a + Bx)*
Parameters Two!
1. Intercept "a": Value of Y when X = 0
2. Slope “b”: How steep vs. flat the line is & what direction +/-.
What Does the Intercept "a" Represent?
The value of Y when X = 0
How much depression does one experience when they have no pain interference?
What Does the Slope "b" Represent?
B shows the gradient and direction!
> The b coefficient is unstandardised because it is represented in original units
> The standardised coefficient (beta) is the SAME as the correlation coefficient!
When is it NOT Appropriate to Use Simple Linear Regression?
When we have a dichotomous independent variable!
> Regression can have dichotomous predictors BUT it is more appropriate to use an independent-samples T-test!
> Analogous to point biserial correlation: still works because you can model the relationship between two IV categories with a straight line
Using Slope & Intercept to Predict Scores
We can calculate the expected Y score when slope + intercept are known & X is provided. Simply, plug in X score into regression equation.
Framing Conclusions in Regression Analysis
Conclusion Para Structure
1. X was a statistically significant predictor of Y scores F(1,df-residual) =, p [ ].
> Can use F-ratio or T-statistic
2. A one-point increase in X predicted []-point increase in Y.
> Requires Y-hat
3. X explained []% of the variance in Y scores, a large effect.
> Requires R-squared or standardised beta
Navigating ANOVA Summary Table in Regression
SS(Total) = Squared Sum of Y minus Y-bar
- That is, total variability
- Squared sum of each data points’ deviation scores from mean
SS(Residual) = Squared Sum of Y minus Y-hat
- That is, unexplained variability
- Squared sum of each data points’ deviation from regression model AKA predicted line
- Visually, add up height difference between each data point and line
SS(Regression) = Squared Sum of Y-hat minus Y-bar
- That is, variability captured by the regression line
- Squared sum of each predicted/estimated score of Y from the average score of Y
- Visually, add up height differences between the line of best fit and the flat line average
Least Squares Method in Regression
Remember error refers to the difference between predicted Y & actual Y for any given value of X!
> Regression models minimise residual variance via the method of least squares
> Aim of least squares method = have less variability around the regression line than total variability in Y around the mean (TSS)
Sources of Variance in Regression
Remember variance is calculated using sums of squares!
1. Total variance = Total sums of squares (mean Y)
> Sum of (Y minus Y-bar)2
2. Unexplained variance = SS(Residual) AKA Error
> Difference between predicted y vs. actual y
> Sum of (Y minus Y-hat)2
3. Explained variance = SS(Regression)
> Difference between predicted Y & mean Y
> Sum of (Y-hat minus Y-bar)2
How is Variability Represented in A Regression Line?
Remember regression analysis tries to explain or predict variability in outcome “Y”!
> Regression line always runs through the middle of a data cloud on a scatterplot because it represents the predicted score of Y for any given value of X.
> The closer actual scores are to the predicted scores, the better the model predicts Y.
> The further away the actual scores from line, the worse the model predicts Y.
Properties of Error in Regression
Each observation in the dataset has a residual (e)!
e = (Y actual – Y predicted)
> Quantifies vari
> Positive residual = above regression lin.
> Negative residual = below regression line
> Residuals sum to O because regression line intentionally sits in the middle of the data points to balance no. of data above & below.
NOTE To clarify, SS(error) uses squares to quantify variability not explained by independent variable!
Significance Testing in Regression
Two significance tests for the two kinds of effects tested in regression:
1. Model-as-a-whole effects (F-ratio + p-value)
2. Individual variable or predictor effects (t statistics + p-value)
NOTE Significance test involves 2 t statistics because there are 2 parameters: one t for slope + one t for intercept.
Predictor Effect = Whole Model Effect in Simple Linear Regression
> Because there is only one IV!
> F = T-squared; T = square root of F (for the slope coefficient)
> P-value for F-Ratio & T-statistic of the slope are the same!
F-Ratio in Regression
Tests significance of model as a whole
F = mean square of the model / mean square of the residual
Alternatively,
F = [ SS(model) / df(model) ] divided by
[ SS(residual) / df(residual) ]
NOTE Mean Square = sums of squares / degrees of freedom
NOTE Use an ANOVA table to calculate components of F-ratio!
Degrees of Freedom in F-ratio Regression
F-Ratio in regression involves 3 df values!
1. df(model)
> Always 1! Number of IVs is always 1!
2. df(residual)
> Total sample size - 2
3. df(total)
> Total sample size - 1
R-Squared in Regression
AKA Coefficient of Determination
R-squared = SS(Regression) / SS(Total)
*Standardised (%) measure of effect size
*Derived from ANOVA summary to calculate F-ratio when testing significance of model-as-a-whole effects
> 2-12% = small effect
> 13-25% = medium effect
> 26%+ = large effect
Visual/Conceptual Explanation
Imagine a Venn diagram where overlap between X & Y represents the degree of variation in X caused by Y.
E.G. A correlation coefficient of r = .72 represents a coefficient of determination of r2 = .523 so 52.3% of variance in X is explained by changes in Y.
NOTE Comparison R-Squared is literally the square of R (the correlation coefficient)
Why Does Regression Also Involve an ANOVA Summary Table?
Because significance in ANOVA & regression is tested by F-ratio!
> ANOVA F-ratio looks at whether there is more variation between groups than within (error)
> Regression F-ratio tests whether more variation is explained by model than error: MS(model) / MS(error)
Many (Equal) Effect Sized in Regression
1. Model-as-a-whole = R-squared (coefficient of determination)
2. IV/Predictor Effect = Beta coefficient
a) Raw b coefficient = unstandardised
> Because it is in original scale of variable, there is no standardised cut off (need to understand scale of variable to appreciate big vs. small effect)
b) Beta coefficient = Standardised; b x (SDy/SDx)
> Standardised & unstandardised beta always have same sign (positive or negative gradient) – only differ in value.
> The standardised beta coefficient = the correlation coefficient
Unstandardised vs. Standardised Coefficients
We need standardised coefficients to compare effect sizes between studies/variables!
> Unstandardised effect sizes are useful if we can appreciate the “natural” scale BUT psychometric measurement tools are not naturally meaningful because one point difference on 1-7 scale is much bigger than 0-100 scale.
> Thus, comparing effect sizes between different studies or variables is impossible!
T-Statistic in Regression
Tests significance of an individual IV AKA predictor
Symbols: T = b / SE(b)
Words: T = Slope / Standard Error of slope
Conceptual Overview
> Is the slope (b) significantly different from 0?
> If yes, then in unstandardised terms, we can see that for every 1-point increase in X, Y increases or decreased by a significant amount!
NOTE Use a coefficients table to calculate components of t-statistic!
NOTE Ignore T-statistic for intercept because it repeats what the t-statistics of the slope tells us!
Null & Alternative Hypotheses in Regression
F-Statistic Hypotheses
Null H0: R(y,y-hat) = 0 (flat line = no gradient!)
Alternate H1: R(y,y-hat) does not equal O
T-Statistic Hypotheses
Null H0: Beta = 0 (flat line = no gradient)
Alternate H1: Beta does not equal O
NOTE Beta is the same as R
How do you calculate the residual from an individual score?
Residual = Observed y - predicted y
How does a dichotomous predictor affect regression?
The t-test becomes functionally identical to an independent-samples t-test!
> The intercept "a" becomes the reference group mean (whichever is coded as level 0) & corresponding t-statistic determines whether it is significantly different from zero.
> Slope "b" would represent the mean difference between groups & corresponding t-value determines whether this is significant.