PSYC2012 Regression

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/24

There's no tags or description

Looks like no tags are added yet.

Last updated 1:04 PM on 6/12/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

25 Terms

New cards

What is Regression?

Regression is used model a predictive linear relationship between a numeric independent and dependent variable.

Purpose Refers to IV as a predictor of DV

Design Typically used in non-experimental research designs

> E.G. Longitudinal studies where something at one time point is used to predict something at a different time point

> E.G. Cross-sectional survey where many factors are measured

RQ "Does X predict Y?";

New cards

Why are predictive relationships not always causal?

A successful prediction simply identifies whether a statistical association exists NOT a causal relationship/sequence.

> Simple linear regression does not control for confounds/third variables

1. “If you’re a psych student, I predict you are a woman" = NOT causal

2. "If you answer Qs in class, I predict you are high in extraversion" = NOT causal (could be reverse-causal)

3. "If you study lots, I predict you’ll do well in the final" = MAYBE causal BUT many confounds

New cards

Correlation vs. Regression

Correlation is a simplified version of regression!

1. Relationship Model

> Correlation = Linear; Regression = Predictive linear

2. Variables

> BOTH use numeric IV & DV (caveat point biserial correlations/dichotomous predictors)

3. Research Application

> BOTH typically used in non-exp design

New cards

Properties of the Regression Line

AKA (Straight) line of best fit

Equation Regression model produces equation of the line

(Y-hat = a + Bx)*

Parameters Two!

1. Intercept "a": Value of Y when X = 0

2. Slope “b”: How steep vs. flat the line is & what direction +/-.

New cards

What Does the Intercept "a" Represent?

The value of Y when X = 0

How much depression does one experience when they have no pain interference?

New cards

What Does the Slope "b" Represent?

B shows the gradient and direction!

> The b coefficient is unstandardised because it is represented in original units

> The standardised coefficient (beta) is the SAME as the correlation coefficient!

New cards

When is it NOT Appropriate to Use Simple Linear Regression?

When we have a dichotomous independent variable!

> Regression can have dichotomous predictors BUT it is more appropriate to use an independent-samples T-test!

> Analogous to point biserial correlation: still works because you can model the relationship between two IV categories with a straight line

New cards

Using Slope & Intercept to Predict Scores

We can calculate the expected Y score when slope + intercept are known & X is provided. Simply, plug in X score into regression equation.

New cards

Framing Conclusions in Regression Analysis

Conclusion Para Structure

1. X was a statistically significant predictor of Y scores F(1,df-residual) =, p [ ].

> Can use F-ratio or T-statistic

2. A one-point increase in X predicted []-point increase in Y.

> Requires Y-hat

3. X explained []% of the variance in Y scores, a large effect.

> Requires R-squared or standardised beta

New cards

Navigating ANOVA Summary Table in Regression

SS(Total) = Squared Sum of Y minus Y-bar

- That is, total variability

- Squared sum of each data points’ deviation scores from mean

SS(Residual) = Squared Sum of Y minus Y-hat

- That is, unexplained variability

- Squared sum of each data points’ deviation from regression model AKA predicted line

- Visually, add up height difference between each data point and line

SS(Regression) = Squared Sum of Y-hat minus Y-bar

- That is, variability captured by the regression line

- Squared sum of each predicted/estimated score of Y from the average score of Y

- Visually, add up height differences between the line of best fit and the flat line average

New cards

Least Squares Method in Regression

Remember error refers to the difference between predicted Y & actual Y for any given value of X!

> Regression models minimise residual variance via the method of least squares

> Aim of least squares method = have less variability around the regression line than total variability in Y around the mean (TSS)

New cards

Sources of Variance in Regression

Remember variance is calculated using sums of squares!

1. Total variance = Total sums of squares (mean Y)

> Sum of (Y minus Y-bar)2

2. Unexplained variance = SS(Residual) AKA Error

> Difference between predicted y vs. actual y

> Sum of (Y minus Y-hat)2

3. Explained variance = SS(Regression)

> Difference between predicted Y & mean Y

> Sum of (Y-hat minus Y-bar)2

New cards

How is Variability Represented in A Regression Line?

Remember regression analysis tries to explain or predict variability in outcome “Y”!

> Regression line always runs through the middle of a data cloud on a scatterplot because it represents the predicted score of Y for any given value of X.

> The closer actual scores are to the predicted scores, the better the model predicts Y.

> The further away the actual scores from line, the worse the model predicts Y.

New cards

Properties of Error in Regression

Each observation in the dataset has a residual (e)!

e = (Y actual – Y predicted)

> Quantifies vari

> Positive residual = above regression lin.

> Negative residual = below regression line

> Residuals sum to O because regression line intentionally sits in the middle of the data points to balance no. of data above & below.

NOTE To clarify, SS(error) uses squares to quantify variability not explained by independent variable!

New cards

Significance Testing in Regression

Two significance tests for the two kinds of effects tested in regression:

1. Model-as-a-whole effects (F-ratio + p-value)

2. Individual variable or predictor effects (t statistics + p-value)

NOTE Significance test involves 2 t statistics because there are 2 parameters: one t for slope + one t for intercept.

Predictor Effect = Whole Model Effect in Simple Linear Regression

> Because there is only one IV!

> F = T-squared; T = square root of F (for the slope coefficient)

> P-value for F-Ratio & T-statistic of the slope are the same!

New cards

F-Ratio in Regression

Tests significance of model as a whole

F = mean square of the model / mean square of the residual

Alternatively,

F = [ SS(model) / df(model) ] divided by

[ SS(residual) / df(residual) ]

NOTE Mean Square = sums of squares / degrees of freedom

NOTE Use an ANOVA table to calculate components of F-ratio!

New cards

Degrees of Freedom in F-ratio Regression

F-Ratio in regression involves 3 df values!

1. df(model)

> Always 1! Number of IVs is always 1!

2. df(residual)

> Total sample size - 2

3. df(total)

> Total sample size - 1

New cards

R-Squared in Regression

AKA Coefficient of Determination

R-squared = SS(Regression) / SS(Total)

*Standardised (%) measure of effect size

*Derived from ANOVA summary to calculate F-ratio when testing significance of model-as-a-whole effects

> 2-12% = small effect

> 13-25% = medium effect

> 26%+ = large effect

Visual/Conceptual Explanation

Imagine a Venn diagram where overlap between X & Y represents the degree of variation in X caused by Y.

E.G. A correlation coefficient of r = .72 represents a coefficient of determination of r2 = .523 so 52.3% of variance in X is explained by changes in Y.

NOTE Comparison R-Squared is literally the square of R (the correlation coefficient)

New cards

Why Does Regression Also Involve an ANOVA Summary Table?

Because significance in ANOVA & regression is tested by F-ratio!

> ANOVA F-ratio looks at whether there is more variation between groups than within (error)

> Regression F-ratio tests whether more variation is explained by model than error: MS(model) / MS(error)

New cards

Many (Equal) Effect Sized in Regression

1. Model-as-a-whole = R-squared (coefficient of determination)

2. IV/Predictor Effect = Beta coefficient

a) Raw b coefficient = unstandardised

> Because it is in original scale of variable, there is no standardised cut off (need to understand scale of variable to appreciate big vs. small effect)

b) Beta coefficient = Standardised; b x (SDy/SDx)

> Standardised & unstandardised beta always have same sign (positive or negative gradient) – only differ in value.

> The standardised beta coefficient = the correlation coefficient

New cards

Unstandardised vs. Standardised Coefficients

We need standardised coefficients to compare effect sizes between studies/variables!

> Unstandardised effect sizes are useful if we can appreciate the “natural” scale BUT psychometric measurement tools are not naturally meaningful because one point difference on 1-7 scale is much bigger than 0-100 scale.

> Thus, comparing effect sizes between different studies or variables is impossible!

New cards

T-Statistic in Regression

Tests significance of an individual IV AKA predictor

Symbols: T = b / SE(b)

Words: T = Slope / Standard Error of slope

Conceptual Overview

> Is the slope (b) significantly different from 0?

> If yes, then in unstandardised terms, we can see that for every 1-point increase in X, Y increases or decreased by a significant amount!

NOTE Use a coefficients table to calculate components of t-statistic!

NOTE Ignore T-statistic for intercept because it repeats what the t-statistics of the slope tells us!

New cards

Null & Alternative Hypotheses in Regression

F-Statistic Hypotheses

Null H0: R(y,y-hat) = 0 (flat line = no gradient!)

Alternate H1: R(y,y-hat) does not equal O

T-Statistic Hypotheses

Null H0: Beta = 0 (flat line = no gradient)

Alternate H1: Beta does not equal O

NOTE Beta is the same as R

New cards

How do you calculate the residual from an individual score?

Residual = Observed y - predicted y

New cards

How does a dichotomous predictor affect regression?

The t-test becomes functionally identical to an independent-samples t-test!

> The intercept "a" becomes the reference group mean (whichever is coded as level 0) & corresponding t-statistic determines whether it is significantly different from zero.

> Slope "b" would represent the mean difference between groups & corresponding t-value determines whether this is significant.