Poisson Distributed Data

Refresher on logarithms:

100 = 10² – log₁₀(100) = 2
0.5 = e^-0.693 – log_e(0.5) = -0.693
e^-0.693 = 0.5
log(A x B) = log(A) + log(B)

If we can model data by making the arguments of a normal distribution a function of explanatory variables, then can we do the same thing for other distributions

Poisson distribution:

Comprised only of integers, a poisson distribution is therefore often used to model count data
It has only one argument: the mean, which is always equal to the variance
Y_i ~ pois(λ_i)
Often describes counts per unit area or counts per unit time
Results from observations comprised of events that occur at fixed rates per unit time or fixed densities per unit area
Hence often used for survey data

If you define your data to be Poisson distributed, the glm will model the natural log of the mean of the data:

ln(λ_i) = mx_i + c

Y_i = pois(exp(mx_i + c))

ln(λ_i) = mx_i + ɑ_j + β_k + c

Y_i= pois(exp(mx_i + ɑ_j + β_k + c))

Differences between normal and poisson:

Normal data structure

Poisson data structure

There are three different forms of count distribution – regular poisson, over-dispersed, zero-inflated:

Note the right hand tail is extended relative to regular in the over-dispersed graph
Note the spike at 0 and the slight shift to the right to maintain the mean of 4 in the zero-inflated graph

We can determine whether a model fit is satisfactory using the residual deviance

The residual deviance can be assumed to be chi-squared distributed with the residual degrees of the freedom

The most common solution to overdispersion is using a negative binomial distribution

A negative binomial distribution is a poisson distribution with a parameter that is gamma distributed

Y ~ pois(λ)

But λ is not a constant – it comes from a gamma distribution

Y_i ~ Negbin(mu_i, k)

If the ratio of the residual deviance to residual degrees of freedom is much more than one, then a poisson model is not likely to be capturing the over-dispersion and you should consider a negative binomial distribution instead

Having arrived at your preferred model through any means you judge appropriate, you can then test the significance of each explanatory variable through an LRT by dropping the term and comparing the LL with the model that retains it

To report this result:

– treatment had a positive effect on the number of leverets per hare (likelihood ratio test: 2ΔLL = 8.423, df = 2, p = 0.014)