STATS 762 Notes on Linear Regression Models
Essentials of Linear Regression Models
Course Overview
- STATS 762: Graduate course equivalent to STATS 330.
- Focuses on applied regression models with rigorous theoretical underpinnings.
- Main software: R and R Markdown for assignments and in-class tutorials.
Administrative Details
- Lecturers: Alain Vandal (first half) and Kate Lee (second half).
- Schedule:
- Lectures: Monday 9:00–10:00, Wednesday 14:00–16:00.
- Tutorials: Thursday 9:00–10:00 or 10:00–11:00.
- Office Hours: Thursdays 13:00–14:00 (Alain), others TBA.
Models Covered in Previous Courses
- Ordinary (Normal) Regression: Continuous response.
- Logistic Regression: Binary response.
- Poisson Regression: Count data.
- Aim of STATS 762: Deepen understanding of these models and others.
Course Outline
- Review linear models and generalized linear models (GLMs).
- Topics: estimation, diagnostics, inference.
- Purposes of fitting regression models:
- Prediction, understanding, control.
- Model choice for prediction and causal inference.
- Introduction to modern regression methods (e.g., lasso, quantile regression).
The Linear Model
- Most commonly used statistical model; encompasses various types (ANOVA, ANCOVA, etc.).
- Generalized Linear Model (GLM) is an extension of the linear model.
Linear Regression Basics
General equation:
Yi = \beta0 + \beta1X{1i} + \beta2X{2i} + \epsilon_i
- Where:
- Y_i : response variable.
- X{1i} , X{2i} : explanatory variables.
- \epsilon_i : error term (normally distributed).
Objective: Estimate coefficients \beta0, \beta1, \beta_2 and variance \sigma^2 of error terms.
Key Characteristics of the Linear Model
- Responses Y_i are normally distributed.
- Mean \mu_i is a linear combination of predictors.
- Variance \sigma^2 is constant across observations.
- Independence of observations.
Generalized Linear Models (GLMs)
- Include normal, binomial, and Poisson regressions as well as others (gamma, inverse-Gaussian, etc.).
- Structure:
- A probability distribution from a natural exponential family.
- A link function that relates response to predictors.
Key Mathematical Functions
Logit Transform:
\text{logit}(p) = \log \left( \frac{p}{1 - p} \right)
Inverse Logit (Logistic Function):
p = \frac{\exp(\beta0 + \beta1X)}{1 + \exp(\beta0 + \beta1X)}
Concepts of Causation
- Regression used for prediction and causal inference.
- Confounding: A third variable influences both dependent and independent variables, potentially obscuring true relationships.
- Stratification: Adjusting for confounding by separating data into subgroups.
- Adjustments must consider all common causes of exposure and outcome to avoid biases.
Statistical Inference and Model Coefficients
- Inference focused on:
- Estimating model coefficients.
- Understanding response values given predictors.
- Standard inference methods include confidence intervals and hypothesis testing based on estimated coefficients.
- Added Variable F-Test: Compares models to evaluate contribution of additional variables.
Practical Example: Catheter Length
- Data includes patient height and weight as predictors for catheter length (response).
- Model fitted using R's
lm()function; interpretation of coefficients includes assessing the slope and intercept's meaning.
Predictions and Confidence Intervals
- Use
predict()to estimate expected values and provide confidence/prediction intervals. - Important to consider variability and relationship strength among variables when interpreting results.
Conclusion
- Understanding linear regression models is crucial for making accurate predictions and causal conclusions in statistics and research.