Lecture 19 Poisson regression and Generalised Linear Models
MS4215/MS6061 Lecture 19: Poisson Regression
Page 1: Introduction to Poisson Regression
Page 2: Overview
Poisson Regression: A statistical method used for count data.
Maximum Likelihood Function & Estimation: Technique for estimating the parameters of a statistical model.
Generalised Linear Models (GLMs): Framework that includes various models, including Poisson regression.
Link Functions: Functions that connect the mean of the distribution to the linear predictors.
Page 3: Poisson Regression Model for Count Data
Observed Hits:
Counts of hits: 0 (229), 1 (211), 2 (93), 3 (35), 4 (7), 5+ (1)
Expected Hits:
Expected counts: 0 (226.7), 1 (211.4), 2 (98.6), 3 (30.6), 4 (7.1), 5+ (1.6)
Calculation: Mean hits per district = 0.9288, Number of districts = 576
Probability Mass Function (PMF):
P(X=x) for the counts given.
Assumption: Response variable follows a Poisson Distribution where Y~P(ฮป) with E[Y] = Var[Y] = ฮป.
Example: Number of flying-bomb hits in London.
Page 4: Poisson Distribution Visualization
Graphical representation of Poisson distribution with varying parameters 'a' = 1, 4, 10.
Page 5: Applications of Poisson Regression
Common Examples:
Number of credit cards owned per individual.
Number of customers in line at a shop, influenced by items on discount and special events.
Number of doctor visits by patients in a month.
Page 6: Modelling the Mean of a Poisson Response Variable
Model Initialization:
ฮป๐ = ๐ฝ0 + ๐ฝ1๐ฅ1๐ + โฏ + ๐ฝ๐๐ฅ๐๐
Transformation: ln(ฮป๐) = ๐ฝ0 + ๐ฝ1๐ฅ1๐ + โฏ + ๐ฝ๐๐ฅ๐๐
Mean: ฮป๐ = exp(๐ฝ0 + ๐ฝ1๐ฅ1๐ + โฏ + ๐ฝ๐๐ฅ๐๐)
Likelihood Function:
๐ฟ(๐ฝ; ๐ฆ) = โ๐=1^๐ e^โฮป๐ ฮป๐^๐ฆ๐ / ๐ฆ๐! where ฮป๐ = exp(๐ฝ0 + ๐ฝ1๐ฅ1๐ + โฏ + ๐ฝ๐๐ฅ๐๐).
Page 7: Parameter Estimates Interpretation
Single Predictor Model: ln(ฮป๐) = ๐ฝ0 + ๐ฝ1๐ฅ๐
Interpretation of Coefficients:
Exp(๐ฝ1) represents the multiplicative effect on the mean of Y with each unit increase in X.
Page 8: Comparing Doctor Visits Data
Data Overview: Lect19DrVisits.xlsx includes patient visit data, illness types, and age.
Research Interest: Compare doctor visits among different illnesses (1, 2, 3) controlling for age.
Page 9: Regression Analysis in R
R Code:
Convert illness to a factor and fit Poisson regression model using glm()
Utilize functions like anova() and summary() for analysis.
Page 10: Regression Coefficients Example
Output Coefficients:
(Intercept): -5.24712, Age: 0.07015, Illness Type 2: 1.08386, Illness Type 3: 0.36981
Statistical Significance: Employed significance codes to interpret results; AIC values used for model selection.
Null deviance: 287.67, Residual deviance: 189.45, AIC: 373.5
Questions: Formulate based on results, such as comparison between illness types and predicted means.
Page 11: Analysis of Deviance
Deviance Table: Breakdown model evaluation sequentially with terms added.
Deviance Calculation:
Deviance for GLMs represented mathematically; importance of deviance in model fitting.
Page 12: Characteristics of Poisson Regression
Model Type: GLMs with logarithmic link function.
Distribution Properties: Poisson distribution mean equals variance. Recognize overdispersion as a sign of model inadequacy.
Page 13: Overdispersion Example
Data Analysis Example: Variance slightly exceeds the mean indicating overdispersion in visits data.
Page 14: Absenteeism Dataset Overview
Dataset: Lect19DaysAbsent.xlsx focuses on student absenteeism alongside demographic and academic variables.
Page 15: Regression Model for School Absences
Fitting Model: Implement Poisson regression with student's absences as response variable.
Capture coefficients and significant variables indicating their influence.
Observed Results: Mean days absent versus variance showing strong overdispersion.
Page 16: Comparing Regression Models
Negative Binomial Regression: More appropriate for overdispersed data as opposed to Poisson regression.
Model Fitting Comparison: AIC values depict negative binomial model as better fit.
Page 17: Generalised Linear Models Summary
GLM Framework: Describe the relation of predictors to response variables through link functions, improving upon traditional linear regression.
Functionality: Allows variance magnitude to depend on predicted values, enhancing model flexibility.