ST411 Weeks 1-2: Introduction to Linear Regression

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Get a hint
Hint

Outcome Y’s distribution

Get a hint
Hint

- Yf(y;θ)Y \sim f(y; \theta): YY is a random variable that follows a probability distribution with pdf f()f(\cdot) and parameter vector θ\theta (vector or scalar).

- We often parametrize θ=(μ,ψ)\theta = (\mu, \psi), where:

- μ=E(Y)\mu = \mathbb{E}(Y) is the expected value (mean),

- ψ\psi are zero or more additional parameters.

- The focus is typically on modeling the mean of the distribution (μ\mu).

Get a hint
Hint

How to think of regression modelling as specifications for conditional distributions (Y given X)?

Get a hint
Hint

Yf(yxθ)Y \sim f(y| x \text{; } \theta)

Regression modeling provides a framework for defining conditional distributions of the response variable YY based on predictor variables XX.

It allows for estimating how the expected value of YY changes with respect to variations in XX and the parameters θ\theta of the conditional distribution.

Card Sorting

1/129

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

130 Terms

1
New cards

Outcome Y’s distribution

- Yf(y;θ)Y \sim f(y; \theta): YY is a random variable that follows a probability distribution with pdf f()f(\cdot) and parameter vector θ\theta (vector or scalar).

- We often parametrize θ=(μ,ψ)\theta = (\mu, \psi), where:

- μ=E(Y)\mu = \mathbb{E}(Y) is the expected value (mean),

- ψ\psi are zero or more additional parameters.

- The focus is typically on modeling the mean of the distribution (μ\mu).

2
New cards

How to think of regression modelling as specifications for conditional distributions (Y given X)?

Yf(yxθ)Y \sim f(y| x \text{; } \theta)

Regression modeling provides a framework for defining conditional distributions of the response variable YY based on predictor variables XX.

It allows for estimating how the expected value of YY changes with respect to variations in XX and the parameters θ\theta of the conditional distribution.

3
New cards

Models for the mean parameter μ\mu

Focus of this course, where the mean param μ\mu of Y depends on X through a linear combination of the X’s.

  • g(.) is the link function where g(μ)=μg(\mu)=\mu

  • Beta’s are a vector of parameters - reg coeffs

<p>Focus of this course, where the mean param $$\mu$$ of Y depends on X through a linear combination of the X’s. </p><p></p><ul><li><p>g(.) is the link function where $$g(\mu)=\mu$$</p></li><li><p>Beta’s are a vector of parameters - reg coeffs </p></li></ul><p></p>
4
New cards

How do we model the mean μ\mu of YY based on xx?

The mean μ\mu depends on xx through a linear combination:

  g(μ)=xβ=x1β1+x2β2++xpβpg(\mu) = \mathbf{x}'\boldsymbol{\beta} = x_1 \beta_1 + x_2 \beta_2 + \cdots + x_p \beta_p

5
New cards

What is g(μ)g(\mu)?

- g(μ)g(\mu) is a link function, a specified function applied to the mean μ\mu of YY.

- Example: In linear regression, g(μ)=μg(\mu) = \mu (identity link).

6
New cards

What is β0\beta_0?

- When the first element of x\mathbf{x} is 1, β0\beta_0 is the constant term or intercept in the model.

- x=(1,x1,,xp1),β=(β0,β1,,βp1)\mathbf{x} = (1, x_1, \ldots, x_{p-1})', \quad \boldsymbol{\beta} = (\beta_0, \beta_1, \ldots, \beta_{p-1})'

7
New cards
8
New cards

What is the distribution of YY in a standard linear regression model?

- YN(xβ,σ2)Y \sim N(\mathbf{x}'\boldsymbol{\beta}, \sigma^2)

- The mean is a linear combination of explanatory variables.

- The variance σ2\sigma^2 is constant (homoskedastic).

9
New cards

How does the expected value μ=E(Y)\mu = \mathbb{E}(Y) depend on x\mathbf{x}?

The mean is a linear combination, where the regression coeffs describe how the mean depends on each of the explanatory variables:

  μ=xβ=x1β1+x2β2++xpβp\mu = \mathbf{x}'\boldsymbol{\beta} = x_1 \beta_1 + x_2 \beta_2 + \cdots + x_p \beta_p

10
New cards

What is assumed about the variance σ2\sigma^2 in linear regression?

- The conditional variance σ2\sigma^2 does not depend on x\mathbf{x}.

- Variance is constant across all values of x\mathbf{x} (homoskedasticity).

11
New cards

What model do you use for continuous (normal) Y?

Linear regression

12
New cards

What model do you use for binary (binomial) Y?

Binary logistic regression

13
New cards

What model do you use for multinomial Y? (Provide distinction between nominal and ordinal model)

Nominal - multinomial logit model

Ordinal - cumulative (ordinal) logit model

14
New cards

What model do you use for Y where it is count data?

Poisson, negative binomial models, log linear models

15
New cards

What model do you use for duration Y?

Survival analysis models

16
New cards

What we need to understand for each model?

knowt flashcard image
17
New cards

Linear reg. overview

  • Is a GLM for a normally distributed Y and identity link

  • Estimation done with maximum likelihood

  • Inference using asymptotic results for ML estimation

18
New cards

Unique properties for linear regression

  • MLE is also ordinary least squares estimation

  • Normality → exact distributional results for test statistics and inference (as opposed to asymptotically)

  • Model can also be motivated without distributional assumptions (using only mean and var. of Y)

19
New cards

What are the random variables and explanatory variables in the linear regression model?

  • Y1,,YnY_1, \dots, Y_n are independent random variables (responses).

  • Each YiY_i has associated observed explanatory variables xi=(xi1,,xip)\mathbf{x}_i = (x_{i1}, \dots, x_{ip})'.

  • pnp \leq n, and the xi\mathbf{x}_i are treated as fixed.

20
New cards

Linear Regression Model: Distribution of YiY_i

- What is the distribution of each YiY_i?

  • YiN(μi,σ2)Y_i \sim N(\mu_i, \sigma^2)

  • where μi=E(Yi)=xiβ=xi1β1++xipβp\mu_i = \mathbb{E}(Y_i) = \mathbf{x}_i' \boldsymbol{\beta} = x_{i1} \beta_1 + \cdots + x_{ip} \beta_p.

  • σ2\sigma^2 is constant across all ii (homoskedastic).

21
New cards

Expected Value E(Yi)\mathbb{E}(Y_i)

- How is E(Yi)\mathbb{E}(Y_i) expressed in terms of xi\mathbf{x}_i and β\boldsymbol{\beta}?

- E(Yi)=μi=xiβ=xi1β1++xipβp\mathbb{E}(Y_i) = \mu_i = \mathbf{x}_i' \boldsymbol{\beta} = x_{i1} \beta_1 + \cdots + x_{ip} \beta_p

- A linear combination of explanatory variables.

22
New cards

Unknown Parameters in Linear Regression

- What are the unknown parameters in the linear regression model?

- The regression coefficients β=(β1,,βp)\boldsymbol{\beta} = (\beta_1, \dots, \beta_p)'.

- The residual variance σ2\sigma^2.

23
New cards

Linear Regression Model: Matrix Form Setup

- How do we define Y\mathbf{Y}, μ\boldsymbol{\mu}, and X\mathbf{X} in matrix form?

* Y=(Y1,,Yn)\mathbf{Y} = (Y_1, \dots, Y_n)'

* μ=(μ1,,μn)\boldsymbol{\mu} = (\mu_1, \dots, \mu_n)'

* X=[x1xn]\mathbf{X} = [\mathbf{x}_1 \dots \mathbf{x}_n]' (an n×pn \times p matrix of explanatory variables)

24
New cards

Linear model in matrix Notation? Expression for μ\mu ? And distribution.

Y=Xβ+ϵ\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}

μ=Xβ\boldsymbol{\mu} = \mathbf{X}\boldsymbol{\beta}

* YN(Xβ,σ2In)\mathbf{Y} \sim N(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n)

* In\mathbf{I}_n is the n×nn \times n identity matrix.

25
New cards

Definition as Mean + Residuals

- How can we rewrite the linear regression model using residuals?

* Y=Xβ+ϵ\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}

* where ϵ=(ϵ1,,ϵn)N(0,σ2In)\boldsymbol{\epsilon} = (\epsilon_1, \dots, \epsilon_n)' \sim N(0, \sigma^2 \mathbf{I}_n)

26
New cards

- What are the properties of the residuals ϵi\epsilon_i?

* Each ϵiN(0,σ2)\epsilon_i \sim N(0, \sigma^2)

Residuals are *independent** of each other.

* All randomness in Y\mathbf{Y} is attributed to ϵ\boldsymbol{\epsilon}.

  • residuals are errors. They capture deviation of observed YiY_i from its mean μi=xiβ\mu_i = \mathbf{x}_i' \boldsymbol{\beta}.

27
New cards

What assumption regarding residuals for linear regression?

ϵN(0,σ2I)\boldsymbol{\epsilon} \sim N(0, \sigma^2 \mathbf{I})

28
New cards

What does ϵN(0,σ2I)\boldsymbol{\epsilon} \sim N(0, \sigma^2 \mathbf{I}) imply? (4 items)

1. E(ϵi)=0\mathbb{E}(\epsilon_i) = 0 for all ii

2. var(ϵi)=σ2\text{var}(\epsilon_i) = \sigma^2 for all ii

3. ϵi\epsilon_i and ϵj\epsilon_j are independent for all iji \neq j

4. Each ϵi\epsilon_i is normally distributed

29
New cards

Estimation of β\boldsymbol{\beta}: Method

- How is β\boldsymbol{\beta} estimated in linear regression?

  • By Ordinary Least Squares (OLS)

  • No distributional assumptions are needed (unlike Maximum Likelihood).

30
New cards

Ordinary Least Squares (OLS): Objective

- What is the goal of the OLS method?

Find β\boldsymbol{\beta} that minimizes the **sum of squared differences** between observed yiy_i and expected E(Yixi;β)=xiβ\mathbb{E}(Y_i \mid \mathbf{x}_i; \boldsymbol{\beta}) = \mathbf{x}_i' \boldsymbol{\beta}.

31
New cards

Sum of Squared Errors: Formula

- What is the formula for the sum of squared errors in OLS? (equation and matrix notation)

S(β)=i=1n(yixiβ)2S(\boldsymbol{\beta}) = \sum_{i=1}^n (y_i - \mathbf{x}_i' \boldsymbol{\beta})^2

Equivalently,

S(β)=(yXβ)(yXβ)S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})'(\mathbf{y} - \mathbf{X} \boldsymbol{\beta})

32
New cards

Definition of β^\hat{\boldsymbol{\beta}}

- What does β^\hat{\boldsymbol{\beta}} represent?

* β^=(β^1,,β^p)\hat{\boldsymbol{\beta}} = (\hat{\beta}_1, \dots, \hat{\beta}_p)'

* It is the value of β\boldsymbol{\beta} that minimizes S(β)S(\boldsymbol{\beta}) (the sum of squared errors).

33
New cards

OLS Estimator β^\hat{\boldsymbol{\beta}}: Derivation

- How is β^\hat{\boldsymbol{\beta}} obtained?

* By solving:

  β[(yXβ)(yXβ)]=0\frac{\partial}{\partial \boldsymbol{\beta}} \left[ (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \right] = 0

Involves taking a *partial derivative** and setting it to zero.

34
New cards

Full derivation of β^\hat{\beta}

knowt flashcard image
35
New cards

What is the final formula for β^\hat{\boldsymbol{\beta}}?

* β^=(XX)1Xy\hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\mathbf{y}

* Requires that (XX)1(\mathbf{X}'\mathbf{X})^{-1} exists (i.e., XX\mathbf{X}'\mathbf{X} must be invertible).

36
New cards

Connection Between OLS and MLE

- When is the OLS estimator also the Maximum Likelihood Estimator (MLE)?

When the YiY_i are normally distributed

Under normality, OLS and MLE give the same estimate for β\boldsymbol{\beta}

37
New cards

What is the formula for β^0\hat{\beta}_0 in simple linear regression?

β^0=yˉxˉβ^1\hat{\beta}_0 = \bar{y} - \bar{x} \hat{\beta}_1

38
New cards

Estimator for Slope β^1\hat{\beta}_1

- What is the formula for β^1\hat{\beta}_1 in simple linear regression?

β^1=i(yiyˉ)(xixˉ)i(xixˉ)2\hat{\beta}_1 = \frac{\sum_i (y_i - \bar{y})(x_i - \bar{x})}{\sum_i (x_i - \bar{x})^2}

This measures the scaled covariance between xx and yy.

39
New cards

Alternative Expression for β^1\hat{\beta}_1

- How else can β^1\hat{\beta}_1 be written as a function of the sample standard deviation and correlations?

* β^1=(sysx)rxy\hat{\beta}_1 = \left( \frac{s_y}{s_x} \right) r_{xy}

* where:

  * sxs_x = sample standard deviation of xix_i

  * sys_y = sample standard deviation of yiy_i

  * rxyr_{xy} = sample correlation between xix_i and yiy_i

40
New cards

Unbiased Estimator of Variance σ2\sigma^2

- What is the unbiased estimator for σ2\sigma^2 in linear regression?

* σ^2=1npi=1n(yixiβ^)2\hat{\sigma}^2 = \frac{1}{n - p} \sum_{i=1}^n (y_i - \mathbf{x}_i' \hat{\boldsymbol{\beta}})^2

* Alternatively: σ^2=1npee\hat{\sigma}^2 = \frac{1}{n - p} \mathbf{e}'\mathbf{e}

* Divides by npn-p for unbiasedness.

41
New cards

Definition of Residuals

- How are the residuals e\mathbf{e} defined?

e=yXβ^=yy^\mathbf{e} = \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{y} - \hat{\mathbf{y}}

-

Each residual: ei=yixiβ^=yiy^ie_i = y_i - \mathbf{x}_i' \hat{\boldsymbol{\beta}} = y_i - \hat{y}_i

-

* e = vector of residuals (errors) for all observations

* y = vector of observed outcomes

* X = matrix of explanatory variables

* β̂ = vector of estimated regression coefficients

* ŷ = vector of predicted (fitted) values

* eᵢ = residual for the i-th observation

* yᵢ = observed outcome for the i-th observation

* xᵢ'β̂ = predicted value for the i-th observation (ŷᵢ)

* ŷᵢ = fitted value for the i-th observation

42
New cards

MLE Estimator of Variance σ2\sigma^2

- What is the maximum likelihood estimator (MLE) of σ2\sigma^2?

* σ^ML2=een\hat{\sigma}^2_{ML} = \frac{\mathbf{e}'\mathbf{e}}{n}

* Divides by nn (not npn-p).

43
New cards

Formula for Residual Vector e\mathbf{e}

- What is the formula for the residual vector e\mathbf{e}?

* e=yXβ^\mathbf{e} = \mathbf{y} - \mathbf{X} \hat{\boldsymbol{\beta}}

* Also written as: e=yy^\mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} where y_hat is the fitted values

44
New cards

Formula for Individual Residual eie_i

- What is the formula for the residual eie_i?

* ei=yixiβ^e_i = y_i - \mathbf{x}_i' \hat{\boldsymbol{\beta}}

* Also written as: ei=yiy^ie_i = y_i - \hat{y}_i

45
New cards

What is the expectation of β^\hat{\boldsymbol{\beta}}

E(β^)=β\mathbb{E}(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta}

Thus, β^\hat{\boldsymbol{\beta}} is an unbiased estimator of β\boldsymbol{\beta}.

46
New cards

Derivation of E(β^)E(\hat{\beta})

knowt flashcard image
47
New cards

Variance of β^\hat{\boldsymbol{\beta}}

- What is the variance of β^\hat{\boldsymbol{\beta}}? (true variable of beta hat)

Var(β^)=σ2(XX)1\text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}

48
New cards

Unbiased Estimator of Var(β^)\text{Var}(\hat{\boldsymbol{\beta}})

- What is the unbiased estimator for Var(β^)\text{Var}(\hat{\boldsymbol{\beta}})? This is the estimated variance based on our data. We use sigma hat instead of sigma.

* Var^(β^)=σ^2(XX)1\widehat{\text{Var}}(\hat{\boldsymbol{\beta}}) = \hat{\sigma}^2 (\mathbf{X}'\mathbf{X})^{-1}

* where σ^2\hat{\sigma}^2 is the unbiased estimator of σ2\sigma^2.

* σ^2=1npi=1n(yiy^i)2\hat{\sigma}^2 = \frac{1}{n-p} \sum_{i=1}^n (y_i - \hat{y}_i)^2

* σ^2=1npee\hat{\sigma}^2 = \frac{1}{n-p} \mathbf{e}'\mathbf{e}

* where ei=yiy^ie_i = y_i - \hat{y}_i and e=(e1,,en)\mathbf{e} = (e_1, \dots, e_n)'

49
New cards

Derivation of Var(β^)\text{Var}(\hat{\boldsymbol{\beta}})

knowt flashcard image
50
New cards

Condition on X\mathbf{X} for OLS

- What condition must X\mathbf{X} satisfy for β^\hat{\boldsymbol{\beta}} to be uniquely defined?

* X\mathbf{X} must be full rank: rank(X)=p\text{rank}(\mathbf{X}) = p

* The inverse (XX)1(\mathbf{X}'\mathbf{X})^{-1} must exist.

* Condition fails if:

  * Columns are linearly dependent (multicollinearity)

  * or if p>np > n (more variables than observations)

51
New cards

Distribution of β^\hat{\beta}

β^N(β,σ2(XX)1)\hat{\boldsymbol{\beta}} \sim N\left(\boldsymbol{\beta}, \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}\right)

β^\hat{\boldsymbol{\beta}} is normally distributed because it is a linear combination of normally distributed Y\mathbf{Y}.

52
New cards

In R, how do you run a linear regression using faculty data, with response variable salary and explanatory vars market and yearsdg?

lm(salary ~ market + yearsdg, data = faculty)

53
New cards
<p>Where can you find the distribution of residuals?</p>

Where can you find the distribution of residuals?

knowt flashcard image
54
New cards
<p>Label Beta_0, Beta_1, and Beta_2 </p>

Label Beta_0, Beta_1, and Beta_2

knowt flashcard image
55
New cards
<p>What is the “residual standard error” part telling you? How did it get 511 df? </p>

What is the “residual standard error” part telling you? How did it get 511 df?

It represents the square root of the unbiased estimate of σ2\sigma ², which we denote σ^2\hat{\sigma}² .

Residual standard error = σ^2\text{Residual standard error = } \sqrt{\hat{\sigma}²}

Where σ2^=1npee\hat{\sigma²} = \frac{1}{n-p} e’e

We get 511 df because we have 514 - 3 parameters (B0, B1, B2).

56
New cards
<p>Explain how we get the standard errors for the betas? </p>

Explain how we get the standard errors for the betas?

knowt flashcard image
57
New cards

General interpretation of regression coefficient on xjx_j in a linear regression model

If xjx_j increases by aa units, while controlling for other explanatory variables, the expected value of YY changes by aβja \cdot \beta_j units.

58
New cards
<p>Interpret the coefficient on <code>market</code> , which the marketability point of their discipline. salary is in $1,000 units. </p>

Interpret the coefficient on market , which the marketability point of their discipline. salary is in $1,000 units.

Holding other explanatory variables constant, a one-unit increase in the marketability of ones discipline increases the expected salary by $396 (0.396 × 1000).

59
New cards

What does TSS represent in regression?

* TSS = Total Sum of Squares

* TSS=i=1n(yiyˉ)2\text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2

* Measures total variability in yy around its mean

* It’s the baseline variability before fitting any model

60
New cards

What is the decomposition of TSS in linear regression?

* TSS=XSS+RSS\text{TSS} = \text{XSS} + \text{RSS}

* XSS = Explained (Regression) Sum of Squares

* RSS = Residual Sum of Squares

* This breaks total variation into model + leftover:

    * (yiyˉ)2=(y^iyˉ)2+(yiy^i)2\sum (y_i - \bar{y})^2 = \sum (\hat{y}_i - \bar{y})^2 + \sum (y_i - \hat{y}_i)^2

61
New cards

What does XSS represent?

* XSS = (y^iyˉ)2\sum (\hat{y}_i - \bar{y})^2  

  * Measures improvement from using predictors  

  * Variation explained by the model (fitted values)

  • Amount of the variation explained when the fitted values are allowed to depend on xix_i and thus vary by i.

62
New cards

What does RSS represent?

* RSS = (yiy^i)2\sum (y_i - \hat{y}_i)^2  

  * Measures how far the actual yiy_i are from the model’s predictions  

  * Unexplained variation = residuals

63
New cards

What does TSS represent?

Describes the total variation in the values of yiy_i in the sample.

64
New cards

What is the formula for R2R^2 (coefficient of determination)?

* R2=XSSTSS=1RSSTSSR^2 = \frac{\text{XSS}}{\text{TSS}} = 1 - \frac{\text{RSS}}{\text{TSS}}

* Measures goodness of fit

* Tells how much of the total variation in yy is explained by the model

-

R2R^2 is the proportion of total variation in yy that is explained by variation in the explanatory variables.

* R2[0,1]R^2 \in [0, 1]

* Closer to 1 → better model fit

65
New cards

How else can R2R^2 be calculated?

* R2=(cor(y,y^))2R^2 = (\text{cor}(y, \hat{y}))^2

* It’s the square of the correlation between observed and predicted yy values

66
New cards
<p>Interpret the R² </p>

Interpret the R²

Interpretation example: R2=0.6795R^2 = 0.6795

About 68% of the variation in salaries in this sample is accounted for by variation in years since PhD and marketability.

67
New cards

What is the formula for the adjusted R2R^2 statistic?

Radj2=(n1)R2(p1)npR^2_{\text{adj}} = \frac{(n - 1)R^2 - (p - 1)}{n - p}

68
New cards

How is adjusted R2R^2 different from regular R2R^2?

Adjusted R2R^2 does not necessarily increase when we add new explanatory variables

-

It accounts for model complexity using a penalty factor

-

It is more similar to penalised model assessment criteria such as AIC

69
New cards

What general form do most hypothesis tests in regression take?

Rβ=r\mathbf{R\beta = r}

R\mathbf{R} is a known matrix and r\mathbf{r} is a known vector — together they define constraints on β\boldsymbol{\beta}

70
New cards

What is a null hypothesis for testing a single coefficient?

H0:βj=0H_0: \beta_j = 0

Implies the coefficient of xjx_j is 0, so xjx_j can be omitted without loss

71
New cards

What is a hypothesis involving a linear combination of coefficients?

H0:β1=β2H_0: \beta_1 = \beta_2

Matrix form: R=[0 1 1]\mathbf{R} = [0\ 1\ -1] (B0, B1, B2) and r=0\mathbf{r} = 0 .

More generally, that some coeffs are equal to each other.

72
New cards

What is an example of multiple simultaneous coefficient tests?

H0:β1=0H_0: \beta_1 = 0 and β2=0\beta_2 = 0

R=[010001]\mathbf{R} = \begin{bmatrix}0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix} and r=[00]\mathbf{r} = \begin{bmatrix}0 \\ 0\end{bmatrix}

73
New cards

What is the sampling distribution of β^\hat{\boldsymbol{\beta}} in normal linear regression?

β^N(β,σ2(XX)1)\hat{\boldsymbol{\beta}} \sim \mathcal{N}(\boldsymbol{\beta}, \sigma^2 (\mathbf{X}'\mathbf{X})^{-1})

The estimated coefficients follow a normal distribution with mean equal to the true coefficients and a variance that depends on the error variance and the design matrix.

74
New cards

How do you compute se(β^j)\text{se}(\hat{\beta}_j)?

se(β^j)=σ2[(XX)1]jj\text{se}(\hat{\beta}_j) = \sqrt{\sigma^2 \left[(\mathbf{X}'\mathbf{X})^{-1}\right]_{jj}}

75
New cards

Formula for t-statistic (general)?

t=β^jβjse(β^j)t = \frac{\hat{\beta}_j - \beta_j}{\text{se}(\hat{\beta}_j)}

76
New cards

What is the formula for the t-statistic used to test H0:βj=rH_0: \beta_j = r or 0?

t=β^jrse(β^j)tnpif H0 is truet = \frac{\hat{\beta}_j - r}{\text{se}(\hat{\beta}_j)} \sim t_{n - p} \quad \text{if } H_0 \text{ is true}

77
New cards

What distribution does the test statistic follow under the null?

It follows a tnpt_{n - p} distribution

78
New cards

What happens to the t-distribution if nn is moderately large?

It can be approximated by a standard normal distribution N(0,1)\mathcal{N}(0, 1)

79
New cards

What is the formula for a (1α)×100%(1 - \alpha) \times 100\% confidence interval for βj\beta_j?

β^j±tnp(1α/2)se^(β^j)\hat{\beta}_j \pm t_{n - p}^{(1 - \alpha/2)} \cdot \hat{se}(\hat{\beta}_j)

80
New cards

What does tnp(1α/2)t_{n - p}^{(1 - \alpha/2)} represent?

It is the 1α/21 - \alpha/2 quantile of the tnpt_{n - p} distribution used to determine the critical value for constructing confidence intervals.

81
New cards

What is the value of tnp(0.975)t_{n - p}^{(0.975)} for a 95% confidence interval?

For large nn, it approximates 1.96 from N(0,1)\mathcal{N}(0, 1) .

In particular, for a 95% CI, α=0.95\alpha = 0.95 .

82
New cards

What is the null hypothesis tested using Rβ=r\mathbf{R\beta = r} when R\mathbf{R} is a 1×p1 \times p row vector?

It tests a single linear constraint on β\boldsymbol{\beta}, like βj=βk\beta_j = \beta_k

83
New cards

What is the distribution of Rβ^r\mathbf{R\hat{\beta} - r} under H0H_0?

Rβ^rN(0,σ2R(XX)1R)\mathbf{R\hat{\beta} - r} \sim \mathcal{N}(0, \sigma^2 \mathbf{R(X'X)}^{-1}\mathbf{R'})

84
New cards

What is the formula for the t-statistic for testing Rβ=r\mathbf{R\beta = r}?

t=Rβ^rσ^2R(XX)1Rtnpt = \frac{\mathbf{R\hat{\beta} - r}}{\sqrt{\hat{\sigma}^2 \mathbf{R(X'X)}^{-1}\mathbf{R'}}} \sim t_{n - p}

85
New cards

What is the key idea behind using matrix notation Rβ=r\mathbf{R\beta = r} for hypothesis testing?

It allows us to test any single linear constraint on β\boldsymbol{\beta} — regardless of how many coefficients are involved — using a t-test

86
New cards

What kind of hypothesis does the F-test handle that the t-test cannot?

The F-test is used to jointly test multiple constraints — i.e., q>1q > 1 constraints on β\boldsymbol{\beta}

87
New cards

What does the null hypothesis H0:Rβ=rH_0: \mathbf{R\beta = r} look like in an F-test?

R\mathbf{R} is a q×pq \times p matrix and r\mathbf{r} is a q×1q \times 1 vector, with q>1q > 1 indicating multiple linear constraints on the coefficients.

88
New cards

What is the distribution of Rβ^r\mathbf{R\hat{\beta} - r} under H0H_0?

Rβ^rN(0,σ2R(XX)1R)\mathbf{R\hat{\beta} - r} \sim \mathcal{N}(0, \sigma^2 \mathbf{R(X'X)}^{-1} \mathbf{R'})

89
New cards

What is the formula for the F-statistic used in this test?

F=(Rβ^r)(σ^2R(XX)1R)1(Rβ^r)qFq,npF = \frac{(\mathbf{R\hat{\beta} - r})' (\hat{\sigma}^2 \mathbf{R(X'X)}^{-1} \mathbf{R'})^{-1} (\mathbf{R\hat{\beta} - r})}{q} \sim F_{q, n - p}

90
New cards

How are the t-test and F-test related when q=1q = 1?

F=t2F = t^2 and the tests give the same p-value since F1,np=tnp2F_{1, n - p} = t_{n - p}^2

91
New cards

What is the big-picture idea of the F-test as a comparison of nested models?

The F-test can compare two nested models by testing whether a subset of regression coefficients are all zero — i.e., whether adding predictors significantly improves model fit

92
New cards

What is the null hypothesis in the nested model version of the F-test?

H0:βj=0H_0: \beta_j = 0 for all jSj{1,,p}j \in S_j \subset \{1, \dots, p\} — i.e., a subset of coefficients are all zero.

93
New cards

How are Model 0 and Model 1 defined in a nested model comparison?

Model 0 is the restricted model under H0H_0 with predictors (x1,x2x_1, x_2), the model under the null.

Model 1 is the full model under H1H_1 (x1,x2,x3,x4x_1, x_2, x_3, x_4) , the model under the alternative.

Model 0 is obtained from Model 1 by setting β3=β4=0\beta_3 = \beta_4 = 0, so Model 0 is nested in model 1.

94
New cards

What is the general null hypothesis when comparing nested models by F-test?

If Model 0 has p0p_0 predictors and Model 1 has p1p_1 predictors where p1 > p0.

β\beta_* are the predictors in β1\beta_1 but not in β0\beta_0.

H0:β=0 (Model 0)H_0: \beta_* = 0 \text{ (Model 0)}

H1:At least one of the coefficients is non-zeroH_1: \text{At least one of the coefficients is non-zero}

95
New cards

What is the F-statistic formula using residual sums of squares (RSS)?

F=(RSS0RSS1)/(p1p0)RSS1/(np1)Fp1p0,np1F = \frac{(RSS_0 - RSS_1)/(p_1 - p_0)}{RSS_1 / (n - p_1)} \sim F_{p_1 - p_0,\, n - p_1}

F=(R12R02)/(p1p0)(1R12)/(np1)F = \frac{(R_1^2 - R_0^2)/(p_1 - p_0)}{(1 - R_1^2)/(n - p_1)}

F=np1p1p0RSS0RSS1RSS1F = \frac{n - p_1}{p_1 - p_0} \cdot \frac{RSS_0 - RSS_1}{RSS_1}

96
New cards

What are the two formulations of the F-test, and how do they differ in logic?

1. The Wald (t-test) form tests whether β^\hat{\boldsymbol{\beta}}_* are close to 0 relative to their variances

2. The nested model form tests whether Model 0 explains the data nearly as well as Model 1 (which includes β\boldsymbol{\beta}_*).

Wald form ≈ Wald test

Nested model form ≈ Likelihood ratio test

IN LINEAR REGRESSION, THE WALD AND LR TEST VERSIONS OF THE F-TEST ARE EQUIVALENT. The Wald test evaluates the significance of individual coefficients, while the nested model form assesses overall model fit by comparing explanatory power between the two models.

97
New cards

What happens when the F-test has only one constraint (q=1q = 1)?

It becomes a test of H0:βj=0H_0: \beta_j = 0 and the F-statistic reduces to F=t2F = t^2 — the test is equivalent to the t-test

98
New cards

What is the null hypothesis in the F-test when β\boldsymbol{\beta}_* includes all coefficients except the intercept?

That all explanatory variable coefficients are zero — i.e., H0:β1==βp1=0H_0: \beta_1 = \dots = \beta_{p-1} = 0

99
New cards

What is the F-statistic formula when testing if all explanatory variable coefficients are zero?

F=npp1TSSRSSRSS=(R2)/(p1)(1R2)/(np)Fp1,npF = \frac{n - p}{p - 1} \cdot \frac{TSS - RSS}{RSS} = \frac{(R^2)/(p - 1)}{(1 - R^2)/(n - p)} \sim F_{p - 1,\, n - p}

where n is the sample size, and p is the number of parameters. In this case, RSS0 = TSS and R²_0 = 0.

This test is usually reported in standard reg. output, but not really an interesting hypothesis.

100
New cards

What are residuals, and why are they useful in regression diagnostics?

Residuals are ei=yixiβ^e_i = y_i - \mathbf{x}_i'\hat{\boldsymbol{\beta}}.

They help check whether model assumptions are satisfied - To check model assumptions like normality, constant variance, correct specification, and influence of individual observations