Statistics Lecture 3

studied byStudied by 0 people
0.0(0)
learn
LearnA personalized and smart learning plan
exam
Practice TestTake a test on your terms and definitions
spaced repetition
Spaced RepetitionScientifically backed study method
heart puzzle
Matching GameHow quick can you match all your cards?
flashcards
FlashcardsStudy terms and definitions

1 / 11

encourage image

There's no tags or description

Looks like no one added any tags here yet for you.

12 Terms

1

Hypotheses, study characteristics and variables

  • Cross-sectional study across randomly selected schools in the Netherlands

  • Three quantitative variables:

    • y: Response variable, also called outcome (in this case academic performance)

    • x1 and x2: Explanatory variables, also named predictor (in this case class size and percentage of free meals as an indicator of socio-economic background)

New cards
2

Describing the bivariate associations

  • Scatter plots show the bivariate association between all possible pairs of variables:

  • Academic performance and class size:

    • โ†’ Positive association

    • โ†’ CS explained 11% of the variation in AP

  • Academic performance and percentage free meals:

    • โ†’ Negative association

    • โ†’ PFM explained 85% of the variation in AP

  • Class size and percentage free meals:

    • โ†’ Negative association

    • โ†’ PFM explained 12% of the variation in CS

  • Q: How much variation in AP will PFM and CS together explain?

  • Not simply the sum of 85% and 11%: they are confounded (i.e., CS shares 12% of its variation with PFM)

  • โ†’ We need a multiple regression model

<ul><li><p><span>Scatter plots show the bivariate association between all possible pairs of variables:</span></p></li><li><p><span><strong>Academic performance and class size:</strong></span></p><ul><li><p><span>โ†’ Positive association</span></p></li><li><p><span>โ†’ CS explained 11% of the variation in AP</span></p></li></ul></li><li><p><span><strong>Academic performance and percentage free meals:</strong></span></p><ul><li><p><span>โ†’ Negative association</span></p></li><li><p><span>โ†’ PFM explained 85% of the variation in AP</span></p></li></ul></li><li><p><span><strong>Class size and percentage free meals:</strong></span></p><ul><li><p><span>โ†’ Negative association</span></p></li><li><p><span>โ†’ PFM explained 12% of the variation in CS</span></p></li></ul></li><li><p><span><strong><em>Q: How much variation in AP will PFM and CS together explain?</em></strong></span></p></li><li><p><span>Not simply the sum of 85% and 11%: they are confounded (i.e., CS shares 12% of its variation with PFM)</span></p></li><li><p><span>โ†’ We need a multiple regression model</span></p></li></ul><p></p>
New cards
3

The multiple regression model

  • We include multiple predictors of our outcome variable y.

  • ๐‘ฆ =๐‘Ž+๐‘1โˆ—๐‘ฅ1+๐‘2โˆ—๐‘ฅ2+โ‹ฏ+๐‘๐‘˜โˆ—๐‘ฅ๐‘˜+๐‘’

  • a is still the intercept: Expected y when all x are 0

  • bโ€™s are the regression slopes for each predictor variable: b1 for x1; b2 for x2; ... ; ๐‘๐‘˜ for ๐‘๐‘˜

  • The meaning of the ๐’ƒโ€™s differs from the meaning of b in simple regression!!

  • โ†’ Statistical control: effects of all other x on both ๐‘ฅ๐‘– and y are eliminated!

  • For example:

  • AP = a + b1 CS + b2 PFM + e

  • a = the expected AP for schools with CS and PFM = 0

  • b1 models the association between CS and AP: โ†’After controlling for the effect of PFM on both CS and AP

  • b2 models the association between PFM and AP, but: โ†’After controlling for the effect of CS on both PFM and AP.

New cards
4

Controlling for other predictors

  • bโ€™s in the multiple regression model are about the association between x and y when we eliminated the effect of all other predictors in the model on both x and y

  • What differences across individuals do we observe in x and y after eliminating the effect of all other x?

  • Can these residual differences in x explain the residual differences in y?

New cards
5

Coefficients b in a multiple regression model

  • AP = a + b1*CS+ b2*PFM + e = 9.981 + 0.003*CS โ€“ 0.067*PFM + e

  • b1 models the association between CS and AP:

    • โ†’ After controlling for the effect of PFM on both CS and AP.

    • โ†’That is, the association between the residuals of AP = a + b*PFM + e = 0.003

  • and the residuals of CS = a + b*PFM + e

  • Same holds for b2: b2 models the association between PFM and AP, but:

    • โ†’After controlling for the effect of CS on both PFM and AP.

    • โ†’That is, the association between the residuals of AP = a + b*CS + e and the residuals of PFM = a + b*CS + e = -0.067

<ul><li><p><span>AP = a + b1*CS+ b2*PFM + e = </span><span style="color: rgb(255, 0, 0)">9.981 + 0.003*CS โ€“ 0.067*PFM + e</span></p></li><li><p><span><em>b1 models the association between CS and AP:</em></span></p><ul><li><p><span>โ†’ After controlling for the effect of PFM on both CS and AP. </span></p></li><li><p><span>โ†’That is, the association between the residuals of AP = a + b*PFM + e = 0.003</span></p></li></ul></li><li><p><span>and the residuals of CS = a + b*PFM + e</span></p></li><li><p><span><em>Same holds for b2: b2 models the association between PFM and AP, but:</em></span></p><ul><li><p><span>โ†’After controlling for the effect of CS on both PFM and AP. </span></p></li><li><p><span>โ†’That is, the association between the residuals of AP = a + b*CS + e and the residuals of PFM = a + b*CS + e = </span><span style="color: rgb(255, 0, 0)">-0.067</span></p></li></ul></li></ul><p></p>
New cards
6

Interpretations of coefficients b in multiple regression

  • The expected change in y for a one-unit increase in predictor x while statistically controlling for all other predictors in the model.

  • The expected change in y for a one-unit increase in predictor x when all other predictors are kept constant.

  • The effect of predictor x on outcome y among subjects with the same score on the other predictors.

  • For example, b1 = 0.003:

    • Academic performance is expected to increase by 0.003 when class size increases with 1 student and the percentage of students with free meals is kept constant.

    • Among schools with the same percentage of students with free meals, we expect the performance to be 0.003 higher for each additional student in a class.

    • โ†’ Given this model, intercepts (a) differ for varying levels of PFM, but the โ€˜partial effectโ€™ of CS remains the same (0.003).

    • โ†’ That is because we statistically controlled for the effect of PFM. Keeping PFM stable at โ€˜someโ€™ level, the effect of CS on AP is 0.003!

<ul><li><p>The <em>expected change </em>in <em>y </em>for a one-unit increase in predictor <em>x while statistically controlling </em>for all other predictors in the model.</p></li><li><p>The <em>expected change in y </em>for a one-unit increase in predictor <em>x when all other predictors are kept constant</em>.</p></li><li><p>The <em>effect of predictor x on outcome y </em>among <em>subjects with the same score on the other predictors</em>.</p></li><li><p>For example, b1 = 0.003:</p><ul><li><p>Academic <em>performance </em>is expected to increase by 0.003 when class size increases with 1 student and the percentage of students with free meals is kept constant.</p></li><li><p>Among schools with the same percentage of students with free meals, we expect the performance to be 0.003 higher for each additional student in a class.</p></li><li><p>โ†’ Given this model, intercepts (a) differ for varying levels of PFM, but the โ€˜partial effectโ€™ of CS remains the same (0.003).</p></li><li><p>โ†’ That is because we statistically controlled for the effect of PFM. Keeping PFM stable at โ€˜someโ€™ level, the effect of CS on AP is 0.003!</p></li></ul></li></ul><p></p>
New cards
7

Summary: Regression coefficients a, b1 and b2

  • Linear multiple regression model: ๐‘ฆ = ๐‘Ž + ๐‘1๐‘ฅ1 + ๐‘2๐‘ฅ2 + ๐‘’

  • Predictor variables: x1 and x2 Coefficients:

  • ๐‘Ž: y-intercept: Expected Y value when all x are 0

  • ๐‘1 and ๐‘2: slopes [partial effect] for x1 and x2

    • ๐‘1: Expected change in y for a one-unit increase in x1 when all other xโ€™s are kept constant.

    • ๐‘2: Expected change in y for a one-unit increase in x2 when all other xโ€™s are kept constant.

  • Can be extended with any number of predictor variables (k)

New cards
8

Hypothesis testing in multiple regression: The F-test for R2

  • Multiple null-hypothesis significant tests exist for models that include multiple predictors

  • Global F-test: Do the predictor variables collectively explain variation in the outcome variable?

  • H0:๐œท๐Ÿ =๐œท๐Ÿ =โ‹ฏ=๐œท๐’Œ = ๐ŸŽ
    โ†’None of the predictors x is associated with y โ†’Same as saying, the population ๐‘…2 = 0

  • Ha: at least one ๐œท๐’Š =ฬธ ๐ŸŽ
    โ†’At least one of the predictors x is associated with y โ†’Same as saying, the population ๐‘…2 > 0

  • MSR: How much variation is explained per predictor in the model?

  • MSE: How much variation can on average be explained by each additional predictor that we, given the sample size, could potentially add to the model.

  • F = ratio of the two. When F > 1: the predictor(s) explain more variation in y than is expected from any randomly selected additional predictor.

<ul><li><p>Multiple null-hypothesis significant tests exist for models that include multiple predictors</p></li><li><p><strong><em>Global F-test: </em></strong><em>Do the predictor variables </em><strong><em>collectively </em></strong><em>explain variation in the outcome variable?</em></p></li><li><p><strong>H0:</strong>๐œท๐Ÿ =๐œท๐Ÿ =โ‹ฏ=๐œท๐’Œ = ๐ŸŽ<br>โ†’<em>None of the predictors x is associated with y </em>โ†’<em>Same as saying, the population </em>๐‘…2 <em>= 0</em></p></li><li><p>Ha: at least one ๐œท๐’Š =ฬธ ๐ŸŽ<br>โ†’<em>At least one of the predictors x is associated with y </em>โ†’<em>Same as saying, the population </em>๐‘…2 <em>&gt; 0</em></p></li><li><p><strong>MSR</strong>: How much variation is explained per predictor in the model?</p></li><li><p><strong>MSE</strong>: How much variation can on average be explained by each additional predictor that we, <em>given the sample size</em>, could potentially add to the model.</p></li><li><p><strong><em>F </em></strong>= ratio of the two. When <em>F </em>&gt; 1: the predictor(s) explain more variation in y than is expected from any randomly selected additional predictor.</p></li></ul><p></p>
New cards
9

Results of the Linear Regression Model: Model Summary

  • ๐‘…2 is .811

  • Calculated as SSR/TSS = 653.26/805.844
    o Thus: About
    81% of the variation in academic performance across schools is explained by class size and the percentage of free meals in a school.

  • โ†’ A large percentage!

<ul><li><p><span>๐‘…2 is </span><span style="color: rgb(255, 0, 0)"><strong>.811 </strong></span></p></li><li><p><span>Calculated as SSR/TSS = <strong>653.26/805.844</strong><br>o Thus: About </span><span style="color: rgb(255, 0, 0)"><strong>81</strong></span><span>% of the variation in academic performance across schools is explained by class size and the percentage of free meals in a school.</span></p></li><li><p><span>โ†’ A <strong>large </strong>percentage!</span></p></li></ul><p></p>
New cards
10

Results of the Linear Regression Model: ANOVA table

  • RSS (Regression sums of squares) = 653.26

    • Variation in y that is explained by the model.

    • df1 = k = 2 โ†’(b1 and b2)

    • Thus: MSR = 653.26 /2 = 326.63

  • SSE (Sums of squared errors [residuals]) = 152.58

    • Variation in y that is not explained by the model.

    • df2 = N โ€“ k โ€“ 1 = 400 โ€“ 2 โ€“ 1 = 397

    • Thus: MSE = 152.58 / 397 = 0.38

  • TSS (Total sum of squares) = 805.84 โ†’ (same as for the simple regression: The regression model does not change the observed variation in y)

  • The F-statistic for the model is 849.87

    • Calculated as MSR / MSE = 326.63 / 0.38

    • With df1 = 2 and df2 = 397, this yields p < .001

    • The modelโ€”including linear effects for class size and percentage of free mealsโ€”thus explains a significant portion of variation in academic performance.

<ul><li><p>RSS (Regression sums of squares) = <span style="color: rgb(255, 0, 0)"><strong>653.26</strong></span></p><ul><li><p>Variation in <em>y </em>that is explained by the model.</p></li><li><p>df1 = <em>k </em>= <strong>2 </strong>โ†’(b1 and b2)</p></li><li><p>Thus: MSR = <strong>653.26 /2 = 326.63</strong></p></li></ul></li><li><p>SSE (Sums of squared errors [residuals]) = <span style="color: rgb(255, 0, 0)"><strong>152.58</strong></span></p><ul><li><p>Variation in <em>y </em>that is not explained by the model.</p></li><li><p>df2 = <em>N โ€“ k โ€“ </em>1 = <span style="color: rgb(255, 0, 0)"><strong>400 โ€“ 2 โ€“ 1 = 397</strong></span></p></li><li><p>Thus: MSE = 152.58 / 397 = 0.38</p></li></ul></li><li><p>TSS (Total sum of squares) = <span style="color: rgb(255, 0, 0)"><strong>805.84 </strong></span>โ†’ <em>(same as for the simple regression: The regression model does not change the observed variation in y)</em></p></li><li><p>The <em>F</em>-statistic for the model is <span style="color: rgb(255, 0, 0)"><strong>849.87</strong></span></p><ul><li><p>Calculated as MSR / MSE = <span style="color: rgb(255, 0, 0)"><strong>326.63 </strong>/ <strong>0.38</strong></span></p></li><li><p>With df1 = <span style="color: rgb(255, 0, 0)"><strong>2 </strong></span>and df2 = <span style="color: rgb(255, 0, 0)"><strong>397</strong></span>, this yields p &lt; .001</p></li><li><p>The modelโ€”including linear effects for class size and percentage of free mealsโ€”thus explains a significant portion of variation in academic performance.</p></li></ul></li></ul><p></p>
New cards
11

Results of the Linear Regression Model: Coefficient table

  • The constant (intercept, a) is 8.820

    • This is the predicted academic performance in schools with a class size (x1) and percentage of students with free meals (x2) of 0.

  • The coefficient for ClassSize (b1) is 0.004 with p = .870 and b1* = 0.004

    • The coefficient is positive but non-significant and negligible.

    • Thus, after statistically controlling for the percentage of free meals in a school, we find no evidence for an association between class size and academic performance.

    • โ†’ Note the difference compared to last weekโ€™s simple regression, when we found b = 0.18, SE = 0.05, t (398) = 3.54, p < .001!

    • โ†’ Thus, CS and AP are positively related, but not after controlling for PFM.

  • The coefficient for Perc Free Meals (b2) is -0.040 with p < .001 and b2* = -0.900.

    • The coefficient is negative, significant, and considered large.

    • Thus, after statistically controlling for class size, we find evidence for a strong association between the percentage free meals in a school and academic performance.

<ul><li><p>The <strong>constant </strong>(intercept, <em>a</em>) is 8.820</p><ul><li><p>This is the <strong>predicted academic performance</strong> in schools with a class size (<em>x1</em>) and percentage of students with free meals (x2) of 0.</p></li></ul></li><li><p>The <strong>coefficient </strong>for <strong>ClassSize </strong><em>(b1</em>) is <span style="color: rgb(255, 0, 0)"><strong>0.004 </strong></span>with <em>p = </em><span style="color: rgb(255, 0, 0)"><strong><em>.870 </em></strong></span>and <em>b1* = </em><span style="color: rgb(255, 0, 0)"><strong>0.004</strong></span></p><ul><li><p>The coefficient is <strong>positive </strong>but <strong>non-significant </strong>and <strong>negligible</strong>.</p></li><li><p>Thus, after statistically controlling for the percentage of free meals in a school, we find no evidence for an association between class size and academic performance.</p></li><li><p>โ†’ Note the difference compared to last weekโ€™s simple regression, when we found <em>b = 0.18, SE = 0.05, t (398) = 3.54, p &lt; .001</em>!</p></li><li><p>โ†’ Thus, CS and AP are positively related, but not after controlling for PFM.</p></li></ul></li><li><p>The <strong>coefficient </strong>for <strong>Perc Free Meals </strong><em>(b2</em>) is <span style="color: rgb(255, 0, 0)"><strong>-0.040 </strong></span>with <em>p </em><span style="color: rgb(255, 0, 0)">&lt; <strong>.001 </strong></span>and <em>b2* = -</em><span style="color: rgb(255, 0, 0)"><strong>0.900.</strong></span></p><ul><li><p>The coefficient is <strong>negative</strong>, <strong>significant</strong>, and considered <strong>large</strong>.</p></li><li><p>Thus, after statistically controlling for class size, we find evidence for a strong association between the percentage free meals in a school and academic performance.</p></li></ul></li></ul><p></p>
New cards
12

Conclusion

  • The modelโ€”including linear effects for class size and percentage of free mealsโ€”is significant. Together, these predictors explains 81% of the variation in academic performance, which is considered large explanatory power.

  • Class size and academic performance are positively related, but when we statistical controlling for differences in the percentage of free meals in a school, we find no evidence for an association between class size and academic performance.

  • Also: The percentage free meals in a school is strongly and positively associated with academic performance, even after controlling for differences in class size.

New cards

Explore top notes

note Note
studied byStudied by 38 people
910 days ago
4.0(1)
note Note
studied byStudied by 4 people
58 days ago
5.0(1)
note Note
studied byStudied by 72 people
199 days ago
5.0(1)
note Note
studied byStudied by 8 people
954 days ago
5.0(1)
note Note
studied byStudied by 20 people
1006 days ago
4.0(1)
note Note
studied byStudied by 13 people
890 days ago
5.0(1)
note Note
studied byStudied by 34 people
848 days ago
5.0(1)
note Note
studied byStudied by 767 people
708 days ago
4.0(4)

Explore top flashcards

flashcards Flashcard (71)
studied byStudied by 2 people
747 days ago
5.0(1)
flashcards Flashcard (49)
studied byStudied by 3 people
302 days ago
5.0(2)
flashcards Flashcard (27)
studied byStudied by 4 people
828 days ago
5.0(1)
flashcards Flashcard (57)
studied byStudied by 40 people
465 days ago
5.0(1)
flashcards Flashcard (30)
studied byStudied by 34 people
497 days ago
5.0(1)
flashcards Flashcard (25)
studied byStudied by 62 people
793 days ago
5.0(2)
flashcards Flashcard (82)
studied byStudied by 100 people
515 days ago
5.0(1)
flashcards Flashcard (68)
studied byStudied by 97 people
23 days ago
5.0(1)
robot