STATS 762 Notes on Linear Regression Models

Essentials of Linear Regression Models

Course Overview

STATS 762: Graduate course equivalent to STATS 330.
- Focuses on applied regression models with rigorous theoretical underpinnings.
- Main software: R and R Markdown for assignments and in-class tutorials.

Administrative Details

Lecturers: Alain Vandal (first half) and Kate Lee (second half).
Schedule:
- Lectures: Monday 9:00–10:00, Wednesday 14:00–16:00.
- Tutorials: Thursday 9:00–10:00 or 10:00–11:00.
- Office Hours: Thursdays 13:00–14:00 (Alain), others TBA.

Models Covered in Previous Courses

Ordinary (Normal) Regression: Continuous response.
Logistic Regression: Binary response.
Poisson Regression: Count data.
Aim of STATS 762: Deepen understanding of these models and others.

Course Outline

Review linear models and generalized linear models (GLMs).
- Topics: estimation, diagnostics, inference.
Purposes of fitting regression models:
- Prediction, understanding, control.
Model choice for prediction and causal inference.
Introduction to modern regression methods (e.g., lasso, quantile regression).

The Linear Model

Most commonly used statistical model; encompasses various types (ANOVA, ANCOVA, etc.).
Generalized Linear Model (GLM) is an extension of the linear model.

Linear Regression Basics

General equation:
Yi = \beta0 + \beta1X{1i} + \beta2X{2i} + \epsilon_i
- Where:
- Y_i : response variable.
- X{1i} , X{2i} : explanatory variables.
- \epsilon_i : error term (normally distributed).
Objective: Estimate coefficients \beta0, \beta1, \beta_2 and variance \sigma^2 of error terms.

Key Characteristics of the Linear Model

Responses Y_i are normally distributed.
Mean \mu_i is a linear combination of predictors.
Variance \sigma^2 is constant across observations.
Independence of observations.

Generalized Linear Models (GLMs)

Include normal, binomial, and Poisson regressions as well as others (gamma, inverse-Gaussian, etc.).
Structure:
- A probability distribution from a natural exponential family.
- A link function that relates response to predictors.

Key Mathematical Functions

Logit Transform:
\text{logit}(p) = \log \left( \frac{p}{1 - p} \right)
Inverse Logit (Logistic Function):
p = \frac{\exp(\beta0 + \beta1X)}{1 + \exp(\beta0 + \beta1X)}

Concepts of Causation

Regression used for prediction and causal inference.
Confounding: A third variable influences both dependent and independent variables, potentially obscuring true relationships.
Stratification: Adjusting for confounding by separating data into subgroups.
Adjustments must consider all common causes of exposure and outcome to avoid biases.

Statistical Inference and Model Coefficients

Inference focused on:
- Estimating model coefficients.
- Understanding response values given predictors.
Standard inference methods include confidence intervals and hypothesis testing based on estimated coefficients.
Added Variable F-Test: Compares models to evaluate contribution of additional variables.

Practical Example: Catheter Length

Data includes patient height and weight as predictors for catheter length (response).
Model fitted using R's lm() function; interpretation of coefficients includes assessing the slope and intercept's meaning.

Predictions and Confidence Intervals

Use predict() to estimate expected values and provide confidence/prediction intervals.
Important to consider variability and relationship strength among variables when interpreting results.

Conclusion

Understanding linear regression models is crucial for making accurate predictions and causal conclusions in statistics and research.