Poisson Distributed Data

Refresher on logarithms:

  • 100 = 102 – log10(100) = 2 

  • 0.5 = e-0.693 – loge(0.5) = -0.693

  • e-0.693 = 0.5

  • log(A x B) = log(A) + log(B)

If we can model data by making the arguments of a normal distribution a function of explanatory variables, then can we do the same thing for other distributions

Poisson distribution:

  • Comprised only of integers, a poisson distribution is therefore often used to model count data

  • It has only one argument: the mean, which is always equal to the variance

  • Yi ~ pois(λi)

  • Often describes counts per unit area or counts per unit time

  • Results from observations comprised of events that occur at fixed rates per unit time or fixed densities per unit area

  • Hence often used for survey data 

If you define your data to be Poisson distributed, the glm will model the natural log of the mean of the data:

ln(λi) = mxi + c

Yi = pois(exp(mxi + c))


ln(λi) = mxi + ɑj + βk + c

Yi = pois(exp(mxi + ɑj + βk + c))


Differences between normal and poisson:

  • Normal data structure


  • Poisson data structure

There are three different forms of count distribution – regular poisson, over-dispersed, zero-inflated:

  • Note the right hand tail is extended relative to regular in the over-dispersed graph

  • Note the spike at 0 and the slight shift to the right to maintain the mean of 4 in the zero-inflated graph

We can determine whether a model fit is satisfactory using the residual deviance

  • The residual deviance can be assumed to be chi-squared distributed with the residual degrees of the freedom

The most common solution to overdispersion is using a negative binomial distribution

  • A negative binomial distribution is a poisson distribution with a parameter that is gamma distributed

Y ~ pois(λ)

  • But λ is not a constant – it comes from a gamma distribution

Yi ~ Negbin(mui, k)


  • If the ratio of the residual deviance to residual degrees of freedom is much more than one, then a poisson model is not likely to be capturing the over-dispersion and you should consider a negative binomial distribution instead

Having arrived at your preferred model through any means you judge appropriate, you can then test the significance of each explanatory variable through an LRT by dropping the term and comparing the LL with the model that retains it

  • To report this result:

– treatment had a positive effect on the number of leverets per hare (likelihood ratio test: 2ΔLL = 8.423, df = 2, p = 0.014)