271 Bus Stats - Ch2-3 - Examining relationships I-II & Producing data (Lec 4-5)

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/33

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

34 Terms

1
New cards
2
New cards
3
New cards

What to look for in a scatter plot ?

  • direction of association

  • strength of relationship

  • outliers

4
New cards

What is the coefficient of correlation ?

°measures the strength of the linear association btwn 2 variables x&y.

= r

5
New cards

How to interpret the coef of variation ?

r is always btwn -1 & +1

  • r>0 —> positive correlation

  • r~=0 —> no correlation

  • r<0 —> negative correlation

.r does not depend on the choice of explanatory / response variable

.r is only useful for linear relationships

6
New cards

Properties of coefficient of correlation ?

The closer to -/+1, the stronger the relationship

r does not depend on the units of measurement

7
New cards

Warning about coef of correlation ?

Outliers can influence correlation tremendously

8
New cards

What is the goal of linear regression ?

Make predications about how x (the explanatory or indpt variable) affects y (the response or dpt variable)

9
New cards

What tell us correlation ?

Correlation tells us about strength and direction of the linear relationship

10
New cards

A line is defined by ?

  • the slope

  • the intercept

<ul><li><p>the slope</p></li><li><p>the intercept</p></li></ul>
11
New cards

Define the line of best fit ?

°or least-squares line

The line that best fits y=b0+b1x has :

  • slope b1 = r(sy/sx)

  • intercept b0 = y_ - b1x_

r = coef of correlation

sx = sample standard deviation of x1,…xn

sy = sample standard deviation of y1,…yn

x_ = sample mean of x1,…xn

y_ = sample mean of y1,…yn

12
New cards

Do the expl slide 33 Lec 4

13
New cards

— — LECTURE 5 — —

14
New cards

What are the 5 steps of linear regression ?

  1. Choose a method for comparing lines (MSE is widely used)

  2. Find the best line

  3. Interpret the result

  4. Evaluate the best line (how good is the line?)

  5. Evaluate if linear regression is appropriate at all (residual analysis)

15
New cards

Step 1 : Compare lines

What criterion should we use to evaluate how ‘good’ a line is ?

The residual error (given a line ŷ = b0 + b1x) :

  • error = observed value y - predicted value ŷ

  • errori = yi - (b0 + b1xi)

The best (or “least-squares”) minimize the sum of squares of the residual errors :

  • SOMMEni=1(yi - (b0 + b1xi))2

16
New cards

Step 2 : Find the best line

Compute ŷ

Observe that : … ?

17
New cards

What can we observe about ŷ ?

Observe that :

  • the slop has the same sign as the correlation coef r

  • if the variables x&y are reversed, r stays the same, but the slope & intercept will change

  • The best best fit line always … ?

18
New cards

The best best fit line always … ?

The best best fit line always passes through the point (x_,y_)

19
New cards

Step 3 : Interpret the results

Do the expl Slides 18-21 Lec 5

20
New cards

Step 4 : How good is the best line ?

Measure the “Goodness of Fit” :

  • the coef of determination : =r2

  • represents the fraction of the variation in y that is explained by changes in x

    • So r is a p%, of how much of the variation in y can be explained by changes in x

  • Since -1 <= r <= 1, we always have 0 <= r2 <= 1

21
New cards

Should we always use linear regression ?

To see if it makes sense to use linear regression, we make a residual plot

<p>To see if it makes sense to use linear regression, we make a residual plot</p>
22
New cards

Step 5 : Residual Analysis

Draw residual plot

23
New cards

Residual plot ?

°a plot where the x-values are the same as in the scatter plot, but the y-values are the residual errors

<p>°a plot where the x-values are the same as in the scatter plot, but the y-values are the residual errors</p>
24
New cards

What observation can we make about residual plots ?

  1. No pattern : Residuals are randomly scattered

    • Ccl : Linear regression is good!

      = Homoskedasticity

  2. Pattern : Residuals show surved pattern

    • Ccl : Nonlinear relationship: Linear regression is not good

  3. No pattern But a change in variability

    • Ccl : we can use linear regression, but predictions will not be as good when variability is larger

      = Heteroskedasticity (lvl of vairaiton is variable, volatility is volatile)

<ol><li><p><strong>No pattern</strong> : Residuals are randomly scattered</p><ul><li><p><u>Ccl :</u> Linear regression is good!</p><p>= Homoskedasticity</p></li></ul></li><li><p><strong>Pattern</strong> : Residuals show surved pattern</p><ul><li><p><u>Ccl :</u> Nonlinear relationship: Linear regression is not good</p></li></ul></li><li><p><strong>No pattern</strong> But a change in variability</p><ul><li><p><u>Ccl :</u> we can use linear regression, but predictions will not be as good when variability is larger</p><p>= Heteroskedasticity (lvl of vairaiton is variable, volatility is volatile)</p></li></ul></li></ol>
25
New cards

Cautions about correlation and regression ?

  • Always look for the scatter plot and do not trust blindly the value of r! A single outlier or observation that is far from the other data points can have a major effect on the value of r, and on the regression line.

    • significantly weaker (less negative, more positive), when outlier (large residual error)

    • significantly stronger (more negatiev, positive), new point far but not especially large residual error

  • ! beware extrapolation ! : difficult to predict values of the explanatory variable far from the observed values

  • Association ≠ Causation (not bc 2 variables go in the same direction that they are linked, it can be hazard)

26
New cards

— — Producing Data — — (Ch3)

27
New cards

Census vs sample ?

°Census : measures every ind in the pop

°Sample survey : measures only a subset of the pop

28
New cards

Observational vs experimental data ?

°Observational study : Record data on indvls without attempting to influence the responses

°Experiment : Impose a treatment on indvls and record responses.

29
New cards

Common sample designs ?

  • Convenience sampling

  • Voluntary response sampling

  • Simple random sampling

  • Stratified random sampling

30
New cards
  • Convenience sampling?

  • Voluntary response sampling?

  • Simple random sampling?

  • Stratified random sampling?

  • Systematic sampling?

  • Cluster sampling?

31
New cards

When is sample design biased ?

When it systematically favors certain outcomes

32
New cards

Common sources of bias ?

  • Undercoverage

  • Non-response

  • Untruthful/inaccurate responses

  • Wording of questions influences answers

33
New cards

Excel :

  • Coef of correlation ?

  • Step 1

  • Step 4

  • CORREL(var1, var2)

34
New cards

Do ex 1 & 2 Slides 47-48 Lec 5

  • Suppose the standard deviation of y1

    ,...,yn is 20, and the standard deviation of x1,...,xn is 10.

    ==> What is the biggest possible value of the slope of the regression line?

  • Suppose y_=1.5, sx=3, r=0.5, b1=2, and b0=1.

    1 What is the standard deviation of y?

    2 What is the mean of y?