Regression as a statistical test

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/39

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

40 Terms

New cards

What is R-squared?

the ratio of explained variation to total variation and is equivalent to the proportional reduction of error

New cards

How do we interpret R squared?

we interpret r squared as the amount of variability in y as explained by x.

New cards

If r squared is close to 1, it means that we have examined almost all of the variation in the DV

New cards

Vice versa, if R squared is close to 0, we have examined very little of the variation in the DV

New cards

R squared sumarises how well x can predict y. It's a measure of the proportional reduction in prediction error: aka, how much better does x explain variation in y compared to when y bar is used to predict y

New cards

R squared tells us that there is ___% less error in x as an explanatory variable for y compared to using bar to predict y

New cards

What is a conditional distribution?

refers to the spread of a value around the regression line. Think of the predicted value like a predicted mean and there will be a spread of normal distribution around it.

New cards

What is RMSE? ( root mean standard error)

Root mean standard error is the estimate of the variability in y values at each value of x. It estimates the spread around the regression line and gives us the estimated standard deviation of conditional distributions of y at each value of x.

New cards

The estimated amount of variability in y at each x is assumed to be identical

New cards

What do we expect when things are normally distributed?

When things are normally distributed, we can expect 95% of our y value (for a given x value) to be within 2 standard deviation of the mean

New cards

what is the marginal distribution?

it describes the likelihood of a single event to occur when considering a set of variables. It helps us understand the distribution of individual variables without needing to consider the relationships between them

New cards

How does the strength of the association between x and y affect conditional variability compared to overall variability?

The stronger the association between x and y, the less that conditional variability will be compared to the overall variability (there is less prediction error)

New cards

How do we interpret the RMSE?

if the RMSE is relatively small, this tells us that our residuals are typically small and that the regression line provides a good fit to the actual data (so you want a small RMSE but a large R squared value)

New cards

Why do we use a two tailed test in regression?

because we want to test whether there is an association, not predict the direction of the association

New cards

When we test for significance in regression, what are we trying to test?

we're trying to see if the distribution of Y is identical at each x value for our linear regression function which occurs when the slope, b, = 0.

New cards

In the Null therefore, b = 0

New cards

And our alternative is b is not equal to 0

New cards

how do w calculate our test statistic?

t = b/SE(b) where b is the slope and SE(b) is the standard error of the slope

New cards

How do we calculate standard error of the slope/SE(b)?

we do b* the square root of 1 - r square/ r* the square root of n-2

New cards

Assumptions of regression?

assumes linearity

New cards

assumes constant variance/ homoscedasticity

New cards

randomisation

New cards

normality: residuals are normally distributed

New cards

there is not much overlap between explanatory variables (multicolinearity)

New cards

What will a full interpretation of a regression model include?

effect size (b)

New cards

-statistical significance

New cards

modal fit

New cards

-assumptions assesed

New cards

Why may regression not be reliable?

Outliers: they can pull the regression slope up/down, changing the fit of the line for the overall data, making it less suitable

New cards

A non linear relationship: just use logistic regression

New cards

Working outside your range of x-values: erm correlation coefficients may differ in different areas

New cards

What does standard error of the slope estimate?

it estimate the variability of our slope if we took repeated samples from the population. We can also use se(b) to estimate confidence intervals ( but i dunno how, dont ask me)

New cards

How is R squared similar to r?

Like r, R squared gives the strength of linear association and does not depend on the units of measurements

New cards

However, while r ranges from -1 to 1, R squared only ranges from 0-1

New cards

R squared is interpreted as the amount of variation explained by x and the larger it is is, the more effective the least squares (regression) line is in predicting y