Definitions, not conclusions.

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/40

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

41 Terms

New cards

Sample correlation coefficient:

rxy

Measures tightness of fit/how tight is that walk from the beach.

measures the strength of the linear relationship between X and Y.

gives the average change in standard deviations of Y for every 1 standard deviation increase in X.

New cards

Dependent.

Response.

Should depend somewhat in the model.

What you’re trying to explain, predict, or measure.

New cards

Independent.

Explanatory.

Independent not because it is independent of Y, but because X is determined independently (called insogencity).

The factor you think influences or explains Y.

New cards

Least squares Regression Line:

The line that fits that data the best.

A line that minimizes the sum of the square distances between the dots/actual values of Y and the line.

The best estimator for the expected value of Y given X.

Minimizes the SSE.

ŷi= bo+b1x-

New cards

ŷi:

The predicted value of Y given X.

It’s also E(YIX)/ CEF

The estimator of E(YIX).

New cards

bo:

The estimated intercept/line.

Estimator of Bo.

gives the predicted value of Y when X=0.

If the X-variable cannot possibly be equal to zero, this is said to be “non-meaningful”.

New cards

b1:

The estimated slope.

Estimator of B1.

the average change in units of Y for every 1 unit increase in X.

It estimates the relationship between X and Y.

It has a variance of std deviation.

SSxy/SSxx

New cards

Residual:

How much is the estimated X off.

How much is the error of the estimation.

The distance between the dots and the line.

The difference between the actual values of Y/dots, and the predicted line.

We want this to be as close to the line as possible.

ei= mean yi-ŷi

New cards

How well did the model work?

What was the purpose of the line?

The purpose of the line was to explain the variation in Y./ The purpose of the line was to use the variation (if it’s hotter than average, mean x, or colder than average, mean x) in X to explain the variation in Y.

New cards

SST=

Total variation in Y

Sum of the squares total.

(yi-mean yi)²

New cards

SSR=

Explained variation in Y.

How much the variation in Y was explained.

When our line leaves the mean, when it rises above or below the mean, would explain the variation in Y.

Sum of the Squares from regression.

(ŷi-mean y)²

New cards

SSE:

unexplained variation in Y.

Sum from the first observation to the last of the actual values of Y minus the predicted squared.

Sum of the squared error.

The sum of the squared distances in the sample between the dots and the line.

Also the sum of the squared residuals. ei²

(yi-ŷi)²

New cards

The SSE and SSR are:

Originally exclusive (you can’t be both explained and unexplained) and collectively exhausted (you’ve got to be one or the other, explained or unexplained).

New cards

R²:

The quoefficient of determination.

The % of the variation in Y explained by the model. SSR/SST=

Explained/Total=

New cards

E(YIX):

Conditional Expectation function (CEF)

The expected value of Y given X.

New cards

Regression Equation:

Before using this, you have to specify a function point. For example: Which shape are we stimating?

Used to, in the example used in the lecture, predict sales (Y) given high temperature (X).

Used to learn more about the relationship between the variables, for example: how much evidence is there that they’re related at all?

Used when specifying the straight line.

E(YIX)= Bo+B1xi

It’s a positive trend if it’s going up.

You can only draw the actual values of Y in the graph.

New cards

Bo:

True population parameters and constant.

The true population intercept/line.

Bo is the E(YIX=0)

(If X cannot be equal to zero, β_0 is said to be “non-meaningful.”)

New cards

B1:

True population parameters and constant.

The true population slope.

The true relationship between x and Y.

How to interpret it: The change in Y for every increase in X.

The change in E(Y|X) from a 1 unit increase in X.

If there is a small, positive, weak relationship, the B1 would be small and positive.

If there is a strong, positive relationship, the B1 would be harder and positive.

New cards

Goals of the Regression Equation:

Main motivation: How is x related to Y.

B is our target paramenter.

Used to demonstrate is there is a true relationship between high temperature, X, and sales, Y.

New cards

yi:

The actual values of Y (the dots around the line)

New cards

Error Term:

The distance between the acual values of Y and the true CEF.

The spread of the dots around the line.

This represents the influence of Y from unobserved factors (unobserved factors=what’s causing the dots to be spread out).

The distance between the actual values of Y and the mean of Y given X.

If you had all these, they would sum up to 0/ E(Ei)=0

Some of these will be positive, some of these will be negative.

The deviation of the actual values of Y from their expected values.

Ei= yi-E(YIX)

New cards

SD(Ei):

The average spread of the error terms.

The average spread of the dots around the line.

This determines the level of precision of the dots in the model.

New cards

Population Model:

yi= Bo+Bi + Ei

Bo+B1xi is the same as E(YIX)

New cards

Ei:

What’s causing the dots to be spread out.

These are the unobserved factors.

What makes estimating the line difficult.

The deviation from that caused by other factors.

New cards

Sampling Distribution of b1:

If the 4 regression assumptions are met, then b1 is normally distributed with a mean of B1, and a std deviation of SD(b1)= SD(Ei)/square root SSxx

b1~N(B1, var(b1)) this can be translated as b1 is normally distributed with a mean of B1 and a variance of b1.

New cards

When we drew the Regression equation, population data/E(YIX) which is CEF, we gave B1 a positive slope, so:

It shapes to a normal distribution.

New cards

True std deviation of b1:

We won’t ever have the true std deviation of b1 because it’s a function of a parameter we don’t have.

SD(b1)= SD(Ei)/square root SSxx=the standard deviation of the error terms.

New cards

The number of slopes we have to estimate.

New cards

Individual significance test for the T test:

This tests if there is a relationship between X & Y.

Is there actually a relationship between temperature and sales?

New cards

Test statistic:

This tests if there are two things related to each other.

New cards

Linearity:

The relationship between X and Y is linear.

Violated when: Parabola, curvi linear, misspecified.

New cards

Independent:

Each observation is independent or not related to each other.

Violated when: Time series data (Measuring variables over different time periods), Autocorrelation (The observations are either positively or negatively related to each other).

New cards

Constant variance:

The variance of the error terms is constant/ The spread of the Error terms is constant.

If met: homo—> The variance of the errors is constant.

Violated when: Hetero—> The variance of the errors is not constant.

New cards

Normality:

The error term is normally distributed.

If met: never completely met, but becomes approximately met with large sample sizes.

Violated when: small sample sizes.

New cards

If the four regression assumptions are met (or approximately met):