Regression as a statistical test

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/39

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

40 Terms

1
New cards

What is R-squared?

the ratio of explained variation to total variation and is equivalent to the proportional reduction of error

2
New cards

How do we interpret R squared?

we interpret r squared as the amount of variability in y as explained by x.

3
New cards
  • If r squared is close to 1, it means that we have examined almost all of the variation in the DV
4
New cards
  • Vice versa, if R squared is close to 0, we have examined very little of the variation in the DV
5
New cards
6
New cards

R squared sumarises how well x can predict y. It's a measure of the proportional reduction in prediction error: aka, how much better does x explain variation in y compared to when y bar is used to predict y

7
New cards
8
New cards

R squared tells us that there is ___% less error in x as an explanatory variable for y compared to using bar to predict y

9
New cards

What is a conditional distribution?

refers to the spread of a value around the regression line. Think of the predicted value like a predicted mean and there will be a spread of normal distribution around it.

10
New cards

What is RMSE? ( root mean standard error)

Root mean standard error is the estimate of the variability in y values at each value of x. It estimates the spread around the regression line and gives us the estimated standard deviation of conditional distributions of y at each value of x.

11
New cards
12
New cards

The estimated amount of variability in y at each x is assumed to be identical

13
New cards

What do we expect when things are normally distributed?

When things are normally distributed, we can expect 95% of our y value (for a given x value) to be within 2 standard deviation of the mean

14
New cards

what is the marginal distribution?

it describes the likelihood of a single event to occur when considering a set of variables. It helps us understand the distribution of individual variables without needing to consider the relationships between them

15
New cards

How does the strength of the association between x and y affect conditional variability compared to overall variability?

The stronger the association between x and y, the less that conditional variability will be compared to the overall variability (there is less prediction error)

16
New cards

How do we interpret the RMSE?

if the RMSE is relatively small, this tells us that our residuals are typically small and that the regression line provides a good fit to the actual data (so you want a small RMSE but a large R squared value)

17
New cards

Why do we use a two tailed test in regression?

because we want to test whether there is an association, not predict the direction of the association

18
New cards

When we test for significance in regression, what are we trying to test?

we're trying to see if the distribution of Y is identical at each x value for our linear regression function which occurs when the slope, b, = 0.

19
New cards

In the Null therefore, b = 0

20
New cards

And our alternative is b is not equal to 0

21
New cards

how do w calculate our test statistic?

t = b/SE(b) where b is the slope and SE(b) is the standard error of the slope

22
New cards

How do we calculate standard error of the slope/SE(b)?

we do b* the square root of 1 - r square/ r* the square root of n-2

23
New cards

Assumptions of regression?

  • assumes linearity
24
New cards
  • assumes constant variance/ homoscedasticity
25
New cards
  • randomisation
26
New cards
  • normality: residuals are normally distributed
27
New cards
  • there is not much overlap between explanatory variables (multicolinearity)
28
New cards

What will a full interpretation of a regression model include?

  • effect size (b)
29
New cards

-statistical significance

30
New cards
  • modal fit
31
New cards

-assumptions assesed

32
New cards

Why may regression not be reliable?

  • Outliers: they can pull the regression slope up/down, changing the fit of the line for the overall data, making it less suitable
33
New cards
34
New cards
  • A non linear relationship: just use logistic regression
35
New cards
36
New cards
  • Working outside your range of x-values: erm correlation coefficients may differ in different areas
37
New cards

What does standard error of the slope estimate?

it estimate the variability of our slope if we took repeated samples from the population. We can also use se(b) to estimate confidence intervals ( but i dunno how, dont ask me)

38
New cards

How is R squared similar to r?

  • Like r, R squared gives the strength of linear association and does not depend on the units of measurements
39
New cards
  • However, while r ranges from -1 to 1, R squared only ranges from 0-1
40
New cards
  • R squared is interpreted as the amount of variation explained by x and the larger it is is, the more effective the least squares (regression) line is in predicting y