Lecture 19 Poisson regression and Generalised Linear Models

MS4215/MS6061 Lecture 19: Poisson Regression

Page 1: Introduction to Poisson Regression

Page 2: Overview

  • Poisson Regression: A statistical method used for count data.

  • Maximum Likelihood Function & Estimation: Technique for estimating the parameters of a statistical model.

  • Generalised Linear Models (GLMs): Framework that includes various models, including Poisson regression.

  • Link Functions: Functions that connect the mean of the distribution to the linear predictors.

Page 3: Poisson Regression Model for Count Data

  • Observed Hits:

    • Counts of hits: 0 (229), 1 (211), 2 (93), 3 (35), 4 (7), 5+ (1)

  • Expected Hits:

    • Expected counts: 0 (226.7), 1 (211.4), 2 (98.6), 3 (30.6), 4 (7.1), 5+ (1.6)

  • Calculation: Mean hits per district = 0.9288, Number of districts = 576

  • Probability Mass Function (PMF):

    • P(X=x) for the counts given.

  • Assumption: Response variable follows a Poisson Distribution where Y~P(ฮป) with E[Y] = Var[Y] = ฮป.

  • Example: Number of flying-bomb hits in London.

Page 4: Poisson Distribution Visualization

  • Graphical representation of Poisson distribution with varying parameters 'a' = 1, 4, 10.

Page 5: Applications of Poisson Regression

  • Common Examples:

    • Number of credit cards owned per individual.

    • Number of customers in line at a shop, influenced by items on discount and special events.

    • Number of doctor visits by patients in a month.

Page 6: Modelling the Mean of a Poisson Response Variable

  • Model Initialization:

    ฮป๐‘– = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1๐‘– + โ‹ฏ + ๐›ฝ๐‘˜๐‘ฅ๐‘˜๐‘–

    • Transformation: ln(ฮป๐‘–) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ1๐‘– + โ‹ฏ + ๐›ฝ๐‘˜๐‘ฅ๐‘˜๐‘–

    • Mean: ฮป๐‘– = exp(๐›ฝ0 + ๐›ฝ1๐‘ฅ1๐‘– + โ‹ฏ + ๐›ฝ๐‘˜๐‘ฅ๐‘˜๐‘–)

  • Likelihood Function:

    • ๐ฟ(๐›ฝ; ๐‘ฆ) = โˆ๐‘–=1^๐‘› e^โˆ’ฮป๐‘– ฮป๐‘–^๐‘ฆ๐‘– / ๐‘ฆ๐‘–! where ฮป๐‘– = exp(๐›ฝ0 + ๐›ฝ1๐‘ฅ1๐‘– + โ‹ฏ + ๐›ฝ๐‘˜๐‘ฅ๐‘˜๐‘–).

Page 7: Parameter Estimates Interpretation

  • Single Predictor Model: ln(ฮป๐‘–) = ๐›ฝ0 + ๐›ฝ1๐‘ฅ๐‘–

  • Interpretation of Coefficients:

    • Exp(๐›ฝ1) represents the multiplicative effect on the mean of Y with each unit increase in X.

Page 8: Comparing Doctor Visits Data

  • Data Overview: Lect19DrVisits.xlsx includes patient visit data, illness types, and age.

  • Research Interest: Compare doctor visits among different illnesses (1, 2, 3) controlling for age.

Page 9: Regression Analysis in R

  • R Code:

    • Convert illness to a factor and fit Poisson regression model using glm()

    • Utilize functions like anova() and summary() for analysis.

Page 10: Regression Coefficients Example

  • Output Coefficients:

    • (Intercept): -5.24712, Age: 0.07015, Illness Type 2: 1.08386, Illness Type 3: 0.36981

  • Statistical Significance: Employed significance codes to interpret results; AIC values used for model selection.

    • Null deviance: 287.67, Residual deviance: 189.45, AIC: 373.5

  • Questions: Formulate based on results, such as comparison between illness types and predicted means.

Page 11: Analysis of Deviance

  • Deviance Table: Breakdown model evaluation sequentially with terms added.

  • Deviance Calculation:

    • Deviance for GLMs represented mathematically; importance of deviance in model fitting.

Page 12: Characteristics of Poisson Regression

  • Model Type: GLMs with logarithmic link function.

  • Distribution Properties: Poisson distribution mean equals variance. Recognize overdispersion as a sign of model inadequacy.

Page 13: Overdispersion Example

  • Data Analysis Example: Variance slightly exceeds the mean indicating overdispersion in visits data.

Page 14: Absenteeism Dataset Overview

  • Dataset: Lect19DaysAbsent.xlsx focuses on student absenteeism alongside demographic and academic variables.

Page 15: Regression Model for School Absences

  • Fitting Model: Implement Poisson regression with student's absences as response variable.

    • Capture coefficients and significant variables indicating their influence.

  • Observed Results: Mean days absent versus variance showing strong overdispersion.

Page 16: Comparing Regression Models

  • Negative Binomial Regression: More appropriate for overdispersed data as opposed to Poisson regression.

  • Model Fitting Comparison: AIC values depict negative binomial model as better fit.

Page 17: Generalised Linear Models Summary

  • GLM Framework: Describe the relation of predictors to response variables through link functions, improving upon traditional linear regression.

  • Functionality: Allows variance magnitude to depend on predicted values, enhancing model flexibility.