Linear Regression

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/35

Earn XP

Description and Tags

Statistics

Linear Regression and Correlation

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

36 Terms

New cards

Intercept (a)

starting value in y-units. The y-value when x is zero

New cards

slope(b)

For every 1 (x unit) increase in (x variable) there is a (slope) (y unit) increase in mean (y variable). slope = SD of x/SD of y

New cards

Correlation Coefficient

There appears to be a (weak/moderate/strong) (positive/negative) (linear/nonlinear) relationship between (x variable) and (y variable)

New cards

Coefficient of Determination

About R²% of the variability in (y variable) can be explained by variability in (x variable)

New cards

Standard Error of residuals

on average, the actual (y variable) values vary about (standard error of slope, sb/SEb with units) from the predicted values (find using LinRegTTest and select sign from alt. hypothesis)

New cards

Residual = Actual - preducted value (how far vertically from line of best fit)

pos: underestimated

neg: overestimated

New cards

Explanatory variable

cause, independent, x-axis

New cards

Response variable

effect, dependent, y-axis

New cards

Association (any form)

Direction: positive/negative, Form: straight/curved Strength: weak/moderate/strong or combo

New cards

Correlation

cannot be greater than one. If given r², square root r²

New cards

Outliers

can either have large residual or high leverage

New cards

Leverage

high leverage if x value is far from mean of x-values, works like a lever if it’s influential

New cards

Quantitative variables condition

both variables are quantitative

New cards

Straight enough condition

scatter plot looks reasonably straight

New cards

Outliers condition

outliers either arent obvious or have a large enough sample to proceed with caution with

New cards

Correlation of 0

no linear association

New cards

Correlation

measures strength of linear association between two variables, which can be strongly associatied but still have small correlation if said association isnt linear

New cards

Linear model

y = a+b(x)

New cards

Residual

observed-predicted

New cards

Turn scatter plot on

stat diagnostic on in mode, stat edit, L1 = X, L2 = Y, 2nd y= on, window 9, graph

New cards

Get linear model on calc

stat-calc-8, store regEq, vars-y-vars-function-y1

New cards

residuals

use l3 to 2nd-stat-resid

New cards

Outliers

horizontal outliers (leverage) more influential than vertical outliers (residuals)

New cards

A residual scatter plot with a cluster and one “stray point:”

The point has high/low leverage and a large/small residual. this point is/isn’t influential/ If the point were removed the correlation would become weaker/stronger, and removing it would strengthen/weaken the association. The slope would increase/decrease/remain the same, since the point is/isn’t influential.

New cards

Null hypothesis

Ho: There is no linear relationship between —- and —-. (B = 0.)

New cards

Alternative hypothesis

Ha: there is a linear relationship between —— and ——. (B doesnt equal 0)

New cards

Assumptions for inference. IN ORDER

Straight enough, Independence, Spread, Nearly Normal (SEISNN, Sally Eats Icees Stealthily Nearing Normandy)

New cards

Straight Enough

Scatter plot of data points is straight enough to try a linear model

New cards

Independence

residual plot is scattered

New cards

Spread

spread of residuals is consistent

New cards

Nearly Normal condition

histogram of residuals is unimodal and symmetric. If possible outlier: with one possible outlier, with the large sample size however, it should be okay to proceed

New cards

After conditions

since the conditions for inference have been met, the sampling distribution of the regression slope can be modeled by a Student’s t-model with — degrees of freedom. We’ll use a regression slope t-test. The equation of the line of best fit of these data points is y = a+bx where —- are measured in — units.

New cards

P-value is less than alpha

the value of t = ____. The P-value of less than alpha means that the association we see in the data is unlikly to occur by chance. Since our P-value is below our signifcance level of —, we reject the null hypothesis and conclude there is strong evidence of a linear relationship between —— and —-. As —— increases, —— (increases/decreases)

New cards

P value is greater than alpha

the value of t = ____. The P-value of greater than alpha means that the association we see in the data is likely to occur by chance. Since our P-value is above our significance level of —-, we fail to reject the null hypothesis and conclude theres weak evidence of a linear relationship between —- and —-.

New cards

conf interval

a GIVEN PERCENT confidence regression slope t-interval: ind. coeficcient +- (invT(conf level, Dof (remember it’s -2!)(SE coefficient of independent variable) equals about (—-,—--)i

New cards

interpret confidence interval

we are GIVEN PERCENT confident that the mean increase/decrease is in an interval between about —- and about —-