ST411 Weeks 1-2: Introduction to Linear Regression

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/129

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

130 Terms

New cards

Outcome Y’s distribution

- Y \sim f(y; \theta): Y is a random variable that follows a probability distribution with pdf f(\cdot) and parameter vector \theta (vector or scalar).

- We often parametrize \theta = (\mu, \psi), where:

- \mu = \mathbb{E}(Y) is the expected value (mean),

- \psi are zero or more additional parameters.

- The focus is typically on modeling the mean of the distribution (\mu).

New cards

How to think of regression modelling as specifications for conditional distributions (Y given X)?

Y \sim f(y| x \text{; } \theta)

Regression modeling provides a framework for defining conditional distributions of the response variable Y based on predictor variables X.

It allows for estimating how the expected value of Y changes with respect to variations in X and the parameters \theta of the conditional distribution.

New cards

Models for the mean parameter \mu

Focus of this course, where the mean param \mu of Y depends on X through a linear combination of the X’s.

g(.) is the link function where g(\mu)=\mu
Beta’s are a vector of parameters - reg coeffs

$Focus of this course, where the mean param $$\mu$$ of Y depends on X through a linear combination of the X’s. <ul><li>g(.) is the link function where $$g(\mu)=\mu$$</li><li>Beta’s are a vector of parameters - reg coeffs </li></ul>$

New cards

How do we model the mean \mu of Y based on x?

The mean \mu depends on x through a linear combination:

g(\mu) = \mathbf{x}'\boldsymbol{\beta} = x_1 \beta_1 + x_2 \beta_2 + \cdots + x_p \beta_p

New cards

What is g(\mu)?

- g(\mu) is a link function, a specified function applied to the mean \mu of Y.

- Example: In linear regression, g(\mu) = \mu (identity link).

New cards

What is \beta_0?

- When the first element of \mathbf{x} is 1, \beta_0 is the constant term or intercept in the model.

- \mathbf{x} = (1, x_1, \ldots, x_{p-1})', \quad \boldsymbol{\beta} = (\beta_0, \beta_1, \ldots, \beta_{p-1})'

New cards

What is the distribution of Y in a standard linear regression model?

- Y \sim N(\mathbf{x}'\boldsymbol{\beta}, \sigma^2)

- The mean is a linear combination of explanatory variables.

- The variance \sigma^2 is constant (homoskedastic).

New cards

How does the expected value \mu = \mathbb{E}(Y) depend on \mathbf{x}?

The mean is a linear combination, where the regression coeffs describe how the mean depends on each of the explanatory variables:

\mu = \mathbf{x}'\boldsymbol{\beta} = x_1 \beta_1 + x_2 \beta_2 + \cdots + x_p \beta_p

New cards

What is assumed about the variance \sigma^2 in linear regression?

- The conditional variance \sigma^2 does not depend on \mathbf{x}.

- Variance is constant across all values of \mathbf{x} (homoskedasticity).

New cards

What model do you use for continuous (normal) Y?

Linear regression

New cards

What model do you use for binary (binomial) Y?

Binary logistic regression

New cards

What model do you use for multinomial Y? (Provide distinction between nominal and ordinal model)

Nominal - multinomial logit model

Ordinal - cumulative (ordinal) logit model

New cards

What model do you use for Y where it is count data?

Poisson, negative binomial models, log linear models

New cards

What model do you use for duration Y?

Survival analysis models

New cards

What we need to understand for each model?

New cards

Linear reg. overview

Is a GLM for a normally distributed Y and identity link
Estimation done with maximum likelihood
Inference using asymptotic results for ML estimation

New cards

Unique properties for linear regression

MLE is also ordinary least squares estimation
Normality → exact distributional results for test statistics and inference (as opposed to asymptotically)
Model can also be motivated without distributional assumptions (using only mean and var. of Y)

New cards

What are the random variables and explanatory variables in the linear regression model?

Y_1, \dots, Y_n are independent random variables (responses).
Each Y_i has associated observed explanatory variables \mathbf{x}_i = (x_{i1}, \dots, x_{ip})'.
p \leq n, and the \mathbf{x}_i are treated as fixed.

New cards

Linear Regression Model: Distribution of Y_i

- What is the distribution of each Y_i?

Y_i \sim N(\mu_i, \sigma^2)
where \mu_i = \mathbb{E}(Y_i) = \mathbf{x}_i' \boldsymbol{\beta} = x_{i1} \beta_1 + \cdots + x_{ip} \beta_p.
\sigma^2 is constant across all i (homoskedastic).

New cards

Expected Value \mathbb{E}(Y_i)

- How is \mathbb{E}(Y_i) expressed in terms of \mathbf{x}_i and \boldsymbol{\beta}?

- \mathbb{E}(Y_i) = \mu_i = \mathbf{x}_i' \boldsymbol{\beta} = x_{i1} \beta_1 + \cdots + x_{ip} \beta_p

- A linear combination of explanatory variables.

New cards

Unknown Parameters in Linear Regression

- What are the unknown parameters in the linear regression model?

- The regression coefficients \boldsymbol{\beta} = (\beta_1, \dots, \beta_p)'.

- The residual variance \sigma^2.

New cards

Linear Regression Model: Matrix Form Setup

- How do we define \mathbf{Y}, \boldsymbol{\mu}, and \mathbf{X} in matrix form?

* \mathbf{Y} = (Y_1, \dots, Y_n)'

* \boldsymbol{\mu} = (\mu_1, \dots, \mu_n)'

* \mathbf{X} = [\mathbf{x}_1 \dots \mathbf{x}_n]' (an n \times p matrix of explanatory variables)

New cards

Linear model in matrix Notation? Expression for \mu ? And distribution.

\mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}

\boldsymbol{\mu} = \mathbf{X}\boldsymbol{\beta}

* \mathbf{Y} \sim N(\mathbf{X}\boldsymbol{\beta}, \sigma^2 \mathbf{I}_n)

* \mathbf{I}_n is the n \times n identity matrix.

New cards

Definition as Mean + Residuals

- How can we rewrite the linear regression model using residuals?

* \mathbf{Y} = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\epsilon}

* where \boldsymbol{\epsilon} = (\epsilon_1, \dots, \epsilon_n)' \sim N(0, \sigma^2 \mathbf{I}_n)

New cards

- What are the properties of the residuals \epsilon_i?

* Each \epsilon_i \sim N(0, \sigma^2)

Residuals are *independent** of each other.

* All randomness in \mathbf{Y} is attributed to \boldsymbol{\epsilon}.

residuals are errors. They capture deviation of observed Y_i from its mean \mu_i = \mathbf{x}_i' \boldsymbol{\beta}.

New cards

What assumption regarding residuals for linear regression?

\boldsymbol{\epsilon} \sim N(0, \sigma^2 \mathbf{I})

New cards

What does \boldsymbol{\epsilon} \sim N(0, \sigma^2 \mathbf{I}) imply? (4 items)

1. \mathbb{E}(\epsilon_i) = 0 for all i

2. \text{var}(\epsilon_i) = \sigma^2 for all i

3. \epsilon_i and \epsilon_j are independent for all i \neq j

4. Each \epsilon_i is normally distributed

New cards

Estimation of \boldsymbol{\beta}: Method

- How is \boldsymbol{\beta} estimated in linear regression?

By Ordinary Least Squares (OLS)
No distributional assumptions are needed (unlike Maximum Likelihood).

New cards

Ordinary Least Squares (OLS): Objective

- What is the goal of the OLS method?

Find \boldsymbol{\beta} that minimizes the **sum of squared differences** between observed y_i and expected \mathbb{E}(Y_i \mid \mathbf{x}_i; \boldsymbol{\beta}) = \mathbf{x}_i' \boldsymbol{\beta}.

New cards

Sum of Squared Errors: Formula

- What is the formula for the sum of squared errors in OLS? (equation and matrix notation)

S(\boldsymbol{\beta}) = \sum_{i=1}^n (y_i - \mathbf{x}_i' \boldsymbol{\beta})^2

Equivalently,

S(\boldsymbol{\beta}) = (\mathbf{y} - \mathbf{X} \boldsymbol{\beta})'(\mathbf{y} - \mathbf{X} \boldsymbol{\beta})

New cards

Definition of \hat{\boldsymbol{\beta}}

- What does \hat{\boldsymbol{\beta}} represent?

* \hat{\boldsymbol{\beta}} = (\hat{\beta}_1, \dots, \hat{\beta}_p)'

* It is the value of \boldsymbol{\beta} that minimizes S(\boldsymbol{\beta}) (the sum of squared errors).

New cards

OLS Estimator \hat{\boldsymbol{\beta}}: Derivation

- How is \hat{\boldsymbol{\beta}} obtained?

* By solving:

\frac{\partial}{\partial \boldsymbol{\beta}} \left[ (\mathbf{y} - \mathbf{X}\boldsymbol{\beta})'(\mathbf{y} - \mathbf{X}\boldsymbol{\beta}) \right] = 0

Involves taking a *partial derivative** and setting it to zero.

New cards

Full derivation of \hat{\beta}

New cards

What is the final formula for \hat{\boldsymbol{\beta}}?

* \hat{\boldsymbol{\beta}} = (\mathbf{X}'\mathbf{X})^{-1} \mathbf{X}'\mathbf{y}

* Requires that (\mathbf{X}'\mathbf{X})^{-1} exists (i.e., \mathbf{X}'\mathbf{X} must be invertible).

New cards

Connection Between OLS and MLE

- When is the OLS estimator also the Maximum Likelihood Estimator (MLE)?

When the Y_i are normally distributed

Under normality, OLS and MLE give the same estimate for \boldsymbol{\beta}

New cards

What is the formula for \hat{\beta}_0 in simple linear regression?

\hat{\beta}_0 = \bar{y} - \bar{x} \hat{\beta}_1

New cards

Estimator for Slope \hat{\beta}_1

- What is the formula for \hat{\beta}_1 in simple linear regression?

\hat{\beta}_1 = \frac{\sum_i (y_i - \bar{y})(x_i - \bar{x})}{\sum_i (x_i - \bar{x})^2}

This measures the scaled covariance between x and y.

New cards

Alternative Expression for \hat{\beta}_1

- How else can \hat{\beta}_1 be written as a function of the sample standard deviation and correlations?

* \hat{\beta}_1 = \left( \frac{s_y}{s_x} \right) r_{xy}

* where:

* s_x = sample standard deviation of x_i

* s_y = sample standard deviation of y_i

* r_{xy} = sample correlation between x_i and y_i

New cards

Unbiased Estimator of Variance \sigma^2

- What is the unbiased estimator for \sigma^2 in linear regression?

* \hat{\sigma}^2 = \frac{1}{n - p} \sum_{i=1}^n (y_i - \mathbf{x}_i' \hat{\boldsymbol{\beta}})^2

* Alternatively: \hat{\sigma}^2 = \frac{1}{n - p} \mathbf{e}'\mathbf{e}

* Divides by n-p for unbiasedness.

New cards

Definition of Residuals

- How are the residuals \mathbf{e} defined?

\mathbf{e} = \mathbf{y} - \mathbf{X}\hat{\boldsymbol{\beta}} = \mathbf{y} - \hat{\mathbf{y}}

Each residual: e_i = y_i - \mathbf{x}_i' \hat{\boldsymbol{\beta}} = y_i - \hat{y}_i

* e = vector of residuals (errors) for all observations

* y = vector of observed outcomes

* X = matrix of explanatory variables

* β̂ = vector of estimated regression coefficients

* ŷ = vector of predicted (fitted) values

* eᵢ = residual for the i-th observation

* yᵢ = observed outcome for the i-th observation

* xᵢ'β̂ = predicted value for the i-th observation (ŷᵢ)

* ŷᵢ = fitted value for the i-th observation

New cards

MLE Estimator of Variance \sigma^2

- What is the maximum likelihood estimator (MLE) of \sigma^2?

* \hat{\sigma}^2_{ML} = \frac{\mathbf{e}'\mathbf{e}}{n}

* Divides by n (not n-p).

New cards

Formula for Residual Vector \mathbf{e}

- What is the formula for the residual vector \mathbf{e}?

* \mathbf{e} = \mathbf{y} - \mathbf{X} \hat{\boldsymbol{\beta}}

* Also written as: \mathbf{e} = \mathbf{y} - \hat{\mathbf{y}} where y_hat is the fitted values

New cards

Formula for Individual Residual e_i

- What is the formula for the residual e_i?

* e_i = y_i - \mathbf{x}_i' \hat{\boldsymbol{\beta}}

* Also written as: e_i = y_i - \hat{y}_i

New cards

What is the expectation of \hat{\boldsymbol{\beta}}

\mathbb{E}(\hat{\boldsymbol{\beta}}) = \boldsymbol{\beta}

Thus, \hat{\boldsymbol{\beta}} is an unbiased estimator of \boldsymbol{\beta}.

New cards

Derivation of E(\hat{\beta})

New cards

Variance of \hat{\boldsymbol{\beta}}

- What is the variance of \hat{\boldsymbol{\beta}}? (true variable of beta hat)

\text{Var}(\hat{\boldsymbol{\beta}}) = \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}

New cards

Unbiased Estimator of \text{Var}(\hat{\boldsymbol{\beta}})

- What is the unbiased estimator for \text{Var}(\hat{\boldsymbol{\beta}})? This is the estimated variance based on our data. We use sigma hat instead of sigma.

* \widehat{\text{Var}}(\hat{\boldsymbol{\beta}}) = \hat{\sigma}^2 (\mathbf{X}'\mathbf{X})^{-1}

* where \hat{\sigma}^2 is the unbiased estimator of \sigma^2.

* \hat{\sigma}^2 = \frac{1}{n-p} \sum_{i=1}^n (y_i - \hat{y}_i)^2

* \hat{\sigma}^2 = \frac{1}{n-p} \mathbf{e}'\mathbf{e}

* where e_i = y_i - \hat{y}_i and \mathbf{e} = (e_1, \dots, e_n)'

New cards

Derivation of \text{Var}(\hat{\boldsymbol{\beta}})

New cards

Condition on \mathbf{X} for OLS

- What condition must \mathbf{X} satisfy for \hat{\boldsymbol{\beta}} to be uniquely defined?

* \mathbf{X} must be full rank: \text{rank}(\mathbf{X}) = p

* The inverse (\mathbf{X}'\mathbf{X})^{-1} must exist.

* Condition fails if:

* Columns are linearly dependent (multicollinearity)

* or if p > n (more variables than observations)

New cards

Distribution of \hat{\beta}

\hat{\boldsymbol{\beta}} \sim N\left(\boldsymbol{\beta}, \sigma^2 (\mathbf{X}'\mathbf{X})^{-1}\right)

\hat{\boldsymbol{\beta}} is normally distributed because it is a linear combination of normally distributed \mathbf{Y}.

New cards

In R, how do you run a linear regression using faculty data, with response variable salary and explanatory vars market and yearsdg?

lm(salary ~ market + yearsdg, data = faculty)

New cards

Where can you find the distribution of residuals?

New cards

Label Beta_0, Beta_1, and Beta_2

New cards

What is the “residual standard error” part telling you? How did it get 511 df?

It represents the square root of the unbiased estimate of \sigma ² , which we denote \hat{\sigma}² .

\text{Residual standard error = } \sqrt{\hat{\sigma}²}

Where \hat{\sigma²} = \frac{1}{n-p} e’e

We get 511 df because we have 514 - 3 parameters (B0, B1, B2).

New cards

Explain how we get the standard errors for the betas?

New cards

General interpretation of regression coefficient on x_j in a linear regression model

If x_j increases by a units, while controlling for other explanatory variables, the expected value of Y changes by a \cdot \beta_j units.

New cards

<p>Interpret the coefficient on <code>market</code> , which the marketability point of their discipline. salary is in $1,000 units. </p>

Interpret the coefficient on market , which the marketability point of their discipline. salary is in $1,000 units.

Holding other explanatory variables constant, a one-unit increase in the marketability of ones discipline increases the expected salary by $396 (0.396 × 1000).

New cards

What does TSS represent in regression?

* TSS = Total Sum of Squares

* \text{TSS} = \sum_{i=1}^n (y_i - \bar{y})^2

* Measures total variability in y around its mean

* It’s the baseline variability before fitting any model

New cards

What is the decomposition of TSS in linear regression?

* \text{TSS} = \text{XSS} + \text{RSS}

* XSS = Explained (Regression) Sum of Squares

* RSS = Residual Sum of Squares

* This breaks total variation into model + leftover:

* \sum (y_i - \bar{y})^2 = \sum (\hat{y}_i - \bar{y})^2 + \sum (y_i - \hat{y}_i)^2

New cards

What does XSS represent?

* XSS = \sum (\hat{y}_i - \bar{y})^2

* Measures improvement from using predictors

* Variation explained by the model (fitted values)

Amount of the variation explained when the fitted values are allowed to depend on x_i and thus vary by i.

New cards

What does RSS represent?

* RSS = \sum (y_i - \hat{y}_i)^2

* Measures how far the actual y_i are from the model’s predictions

* Unexplained variation = residuals

New cards

What does TSS represent?

Describes the total variation in the values of y_i in the sample.

New cards

What is the formula for R^2 (coefficient of determination)?

* R^2 = \frac{\text{XSS}}{\text{TSS}} = 1 - \frac{\text{RSS}}{\text{TSS}}

* Measures goodness of fit

* Tells how much of the total variation in y is explained by the model

R^2 is the proportion of total variation in y that is explained by variation in the explanatory variables.

* R^2 \in [0, 1]

* Closer to 1 → better model fit

New cards

How else can R^2 be calculated?

* R^2 = (\text{cor}(y, \hat{y}))^2

* It’s the square of the correlation between observed and predicted y values

New cards

Interpret the R²

Interpretation example: R^2 = 0.6795

About 68% of the variation in salaries in this sample is accounted for by variation in years since PhD and marketability.

New cards

What is the formula for the adjusted R^2 statistic?

R^2_{\text{adj}} = \frac{(n - 1)R^2 - (p - 1)}{n - p}

New cards

How is adjusted R^2 different from regular R^2?

Adjusted R^2 does not necessarily increase when we add new explanatory variables

It accounts for model complexity using a penalty factor

It is more similar to penalised model assessment criteria such as AIC

New cards

What general form do most hypothesis tests in regression take?

\mathbf{R\beta = r}

\mathbf{R} is a known matrix and \mathbf{r} is a known vector — together they define constraints on \boldsymbol{\beta}

New cards

What is a null hypothesis for testing a single coefficient?

H_0: \beta_j = 0

Implies the coefficient of x_j is 0, so x_j can be omitted without loss

New cards

What is a hypothesis involving a linear combination of coefficients?

H_0: \beta_1 = \beta_2

Matrix form: \mathbf{R} = [0\ 1\ -1] (B0, B1, B2) and \mathbf{r} = 0 .

More generally, that some coeffs are equal to each other.

New cards

What is an example of multiple simultaneous coefficient tests?

H_0: \beta_1 = 0 and \beta_2 = 0

\mathbf{R} = \begin{bmatrix}0 & 1 & 0 \\ 0 & 0 & 1\end{bmatrix} and \mathbf{r} = \begin{bmatrix}0 \\ 0\end{bmatrix}

New cards

What is the sampling distribution of \hat{\boldsymbol{\beta}} in normal linear regression?

\hat{\boldsymbol{\beta}} \sim \mathcal{N}(\boldsymbol{\beta}, \sigma^2 (\mathbf{X}'\mathbf{X})^{-1})

The estimated coefficients follow a normal distribution with mean equal to the true coefficients and a variance that depends on the error variance and the design matrix.

New cards

How do you compute \text{se}(\hat{\beta}_j)?

\text{se}(\hat{\beta}_j) = \sqrt{\sigma^2 \left[(\mathbf{X}'\mathbf{X})^{-1}\right]_{jj}}

New cards

Formula for t-statistic (general)?

t = \frac{\hat{\beta}_j - \beta_j}{\text{se}(\hat{\beta}_j)}

New cards

What is the formula for the t-statistic used to test H_0: \beta_j = r or 0?

t = \frac{\hat{\beta}_j - r}{\text{se}(\hat{\beta}_j)} \sim t_{n - p} \quad \text{if } H_0 \text{ is true}

New cards

What distribution does the test statistic follow under the null?

It follows a t_{n - p} distribution

New cards

What happens to the t-distribution if n is moderately large?

It can be approximated by a standard normal distribution \mathcal{N}(0, 1)

New cards

What is the formula for a (1 - \alpha) \times 100\% confidence interval for \beta_j?

\hat{\beta}_j \pm t_{n - p}^{(1 - \alpha/2)} \cdot \hat{se}(\hat{\beta}_j)

New cards

What does t_{n - p}^{(1 - \alpha/2)} represent?

It is the 1 - \alpha/2 quantile of the t_{n - p} distribution used to determine the critical value for constructing confidence intervals.

New cards

What is the value of t_{n - p}^{(0.975)} for a 95% confidence interval?

For large n, it approximates 1.96 from \mathcal{N}(0, 1) .

In particular, for a 95% CI, \alpha = 0.95 .

New cards

What is the null hypothesis tested using \mathbf{R\beta = r} when \mathbf{R} is a 1 \times p row vector?

It tests a single linear constraint on \boldsymbol{\beta}, like \beta_j = \beta_k

New cards

What is the distribution of \mathbf{R\hat{\beta} - r} under H_0?

\mathbf{R\hat{\beta} - r} \sim \mathcal{N}(0, \sigma^2 \mathbf{R(X'X)}^{-1}\mathbf{R'})

New cards

What is the formula for the t-statistic for testing \mathbf{R\beta = r}?

t = \frac{\mathbf{R\hat{\beta} - r}}{\sqrt{\hat{\sigma}^2 \mathbf{R(X'X)}^{-1}\mathbf{R'}}} \sim t_{n - p}

New cards

What is the key idea behind using matrix notation \mathbf{R\beta = r} for hypothesis testing?

It allows us to test any single linear constraint on \boldsymbol{\beta} — regardless of how many coefficients are involved — using a t-test

New cards

What kind of hypothesis does the F-test handle that the t-test cannot?

The F-test is used to jointly test multiple constraints — i.e., q > 1 constraints on \boldsymbol{\beta}

New cards

What does the null hypothesis H_0: \mathbf{R\beta = r} look like in an F-test?

\mathbf{R} is a q \times p matrix and \mathbf{r} is a q \times 1 vector, with q > 1 indicating multiple linear constraints on the coefficients.

New cards

What is the distribution of \mathbf{R\hat{\beta} - r} under H_0?

\mathbf{R\hat{\beta} - r} \sim \mathcal{N}(0, \sigma^2 \mathbf{R(X'X)}^{-1} \mathbf{R'})

New cards

What is the formula for the F-statistic used in this test?

F = \frac{(\mathbf{R\hat{\beta} - r})' (\hat{\sigma}^2 \mathbf{R(X'X)}^{-1} \mathbf{R'})^{-1} (\mathbf{R\hat{\beta} - r})}{q} \sim F_{q, n - p}

New cards

How are the t-test and F-test related when q = 1?

F = t^2 and the tests give the same p-value since F_{1, n - p} = t_{n - p}^2

New cards

What is the big-picture idea of the F-test as a comparison of nested models?

The F-test can compare two nested models by testing whether a subset of regression coefficients are all zero — i.e., whether adding predictors significantly improves model fit

New cards

What is the null hypothesis in the nested model version of the F-test?

H_0: \beta_j = 0 for all j \in S_j \subset \{1, \dots, p\} — i.e., a subset of coefficients are all zero.

New cards

How are Model 0 and Model 1 defined in a nested model comparison?

Model 0 is the restricted model under H_0 with predictors (x_1, x_2), the model under the null.

Model 1 is the full model under H_1 (x_1, x_2, x_3, x_4) , the model under the alternative.

Model 0 is obtained from Model 1 by setting \beta_3 = \beta_4 = 0, so Model 0 is nested in model 1.

New cards

What is the general null hypothesis when comparing nested models by F-test?

If Model 0 has p_0 predictors and Model 1 has p_1 predictors where p1 > p0.

\beta_* are the predictors in \beta_1 but not in \beta_0.

H_0: \beta_* = 0 \text{ (Model 0)}

H_1: \text{At least one of the coefficients is non-zero}

New cards

What is the F-statistic formula using residual sums of squares (RSS)?

F = \frac{(RSS_0 - RSS_1)/(p_1 - p_0)}{RSS_1 / (n - p_1)} \sim F_{p_1 - p_0,\, n - p_1}

F = \frac{(R_1^2 - R_0^2)/(p_1 - p_0)}{(1 - R_1^2)/(n - p_1)}

F = \frac{n - p_1}{p_1 - p_0} \cdot \frac{RSS_0 - RSS_1}{RSS_1}

New cards

What are the two formulations of the F-test, and how do they differ in logic?

1. The Wald (t-test) form tests whether \hat{\boldsymbol{\beta}}_* are close to 0 relative to their variances

2. The nested model form tests whether Model 0 explains the data nearly as well as Model 1 (which includes \boldsymbol{\beta}_*).

Wald form ≈ Wald test

Nested model form ≈ Likelihood ratio test

IN LINEAR REGRESSION, THE WALD AND LR TEST VERSIONS OF THE F-TEST ARE EQUIVALENT. The Wald test evaluates the significance of individual coefficients, while the nested model form assesses overall model fit by comparing explanatory power between the two models.

New cards

What happens when the F-test has only one constraint (q = 1)?

It becomes a test of H_0: \beta_j = 0 and the F-statistic reduces to F = t^2 — the test is equivalent to the t-test

New cards

What is the null hypothesis in the F-test when \boldsymbol{\beta}_* includes all coefficients except the intercept?

That all explanatory variable coefficients are zero — i.e., H_0: \beta_1 = \dots = \beta_{p-1} = 0

New cards

What is the F-statistic formula when testing if all explanatory variable coefficients are zero?

F = \frac{n - p}{p - 1} \cdot \frac{TSS - RSS}{RSS} = \frac{(R^2)/(p - 1)}{(1 - R^2)/(n - p)} \sim F_{p - 1,\, n - p}

where n is the sample size, and p is the number of parameters. In this case, RSS0 = TSS and R²_0 = 0.

This test is usually reported in standard reg. output, but not really an interesting hypothesis.

100

New cards

What are residuals, and why are they useful in regression diagnostics?

Residuals are e_i = y_i - \mathbf{x}_i'\hat{\boldsymbol{\beta}}.

They help check whether model assumptions are satisfied - To check model assumptions like normality, constant variance, correct specification, and influence of individual observations