Linear Regression

0.0(0)
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/35

flashcard set

Earn XP

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

36 Terms

1
New cards

Intercept (a)

starting value in y-units. The y-value when x is zero

2
New cards

slope(b)

For every 1 (x unit) increase in (x variable) there is a (slope) (y unit) increase in mean (y variable). slope = SD of x/SD of y

3
New cards

Correlation Coefficient

There appears to be a (weak/moderate/strong) (positive/negative) (linear/nonlinear) relationship between (x variable) and (y variable)

4
New cards

Coefficient of Determination

About R²% of the variability in (y variable) can be explained by variability in (x variable)

5
New cards

Standard Error of residuals

on average, the actual (y variable) values vary about (standard error of slope, sb/SEb with units) from the predicted values (find using LinRegTTest and select sign from alt. hypothesis)

6
New cards

Residual = Actual - preducted value (how far vertically from line of best fit)

pos: underestimated

neg: overestimated

7
New cards

Explanatory variable

cause, independent, x-axis

8
New cards

Response variable

effect, dependent, y-axis

9
New cards

Association (any form)

Direction: positive/negative, Form: straight/curved Strength: weak/moderate/strong or combo

10
New cards

Correlation

cannot be greater than one. If given r², square root r²

11
New cards

Outliers

can either have large residual or high leverage

12
New cards

Leverage

high leverage if x value is far from mean of x-values, works like a lever if it’s influential

13
New cards

Quantitative variables condition

both variables are quantitative

14
New cards

Straight enough condition

scatter plot looks reasonably straight

15
New cards

Outliers condition

outliers either arent obvious or have a large enough sample to proceed with caution with

16
New cards

Correlation of 0

no linear association

17
New cards

Correlation

measures strength of linear association between two variables, which can be strongly associatied but still have small correlation if said association isnt linear

18
New cards

Linear model

y = a+b(x)

19
New cards

Residual

observed-predicted

20
New cards

Turn scatter plot on

stat diagnostic on in mode, stat edit, L1 = X, L2 = Y, 2nd y= on, window 9, graph

21
New cards

Get linear model on calc

stat-calc-8, store regEq, vars-y-vars-function-y1

22
New cards

residuals

use l3 to 2nd-stat-resid

23
New cards

Outliers

horizontal outliers (leverage) more influential than vertical outliers (residuals)

24
New cards

A residual scatter plot with a cluster and one “stray point:”

The point has high/low leverage and a large/small residual. this point is/isn’t influential/ If the point were removed the correlation would become weaker/stronger, and removing it would strengthen/weaken the association. The slope would increase/decrease/remain the same, since the point is/isn’t influential.

25
New cards

Null hypothesis

Ho: There is no linear relationship between —- and —-. (B = 0.)

26
New cards

Alternative hypothesis

Ha: there is a linear relationship between —— and ——. (B doesnt equal 0)

27
New cards

Assumptions for inference. IN ORDER

Straight enough, Independence, Spread, Nearly Normal (SEISNN, Sally Eats Icees Stealthily Nearing Normandy)

28
New cards

Straight Enough

Scatter plot of data points is straight enough to try a linear model

29
New cards

Independence

residual plot is scattered

30
New cards

Spread

spread of residuals is consistent

31
New cards

Nearly Normal condition

histogram of residuals is unimodal and symmetric. If possible outlier: with one possible outlier, with the large sample size however, it should be okay to proceed

32
New cards

After conditions

since the conditions for inference have been met, the sampling distribution of the regression slope can be modeled by a Student’s t-model with — degrees of freedom. We’ll use a regression slope t-test. The equation of the line of best fit of these data points is y = a+bx where —- are measured in — units.

33
New cards

P-value is less than alpha

the value of t = ____. The P-value of less than alpha means that the association we see in the data is unlikly to occur by chance. Since our P-value is below our signifcance level of —, we reject the null hypothesis and conclude there is strong evidence of a linear relationship between —— and —-. As —— increases, —— (increases/decreases)

34
New cards

P value is greater than alpha

the value of t = ____. The P-value of greater than alpha means that the association we see in the data is likely to occur by chance. Since our P-value is above our significance level of —-, we fail to reject the null hypothesis and conclude theres weak evidence of a linear relationship between —- and —-.

35
New cards

conf interval

a GIVEN PERCENT confidence regression slope t-interval: ind. coeficcient +- (invT(conf level, Dof (remember it’s -2!)(SE coefficient of independent variable) equals about (—-,—--)i

36
New cards

interpret confidence interval

we are GIVEN PERCENT confident that the mean increase/decrease is in an interval between about —- and about —-