Poisson Distributed Data
Refresher on logarithms:
100 = 102 – log10(100) = 2
0.5 = e-0.693 – loge(0.5) = -0.693
e-0.693 = 0.5
log(A x B) = log(A) + log(B)
If we can model data by making the arguments of a normal distribution a function of explanatory variables, then can we do the same thing for other distributions
Poisson distribution:
Comprised only of integers, a poisson distribution is therefore often used to model count data
It has only one argument: the mean, which is always equal to the variance
Yi ~ pois(λi)
Often describes counts per unit area or counts per unit time
Results from observations comprised of events that occur at fixed rates per unit time or fixed densities per unit area
Hence often used for survey data
If you define your data to be Poisson distributed, the glm will model the natural log of the mean of the data:
ln(λi) = mxi + c
Yi = pois(exp(mxi + c))
ln(λi) = mxi + ɑj + βk + c
Yi = pois(exp(mxi + ɑj + βk + c))
Differences between normal and poisson:
Normal data structure
Poisson data structure
There are three different forms of count distribution – regular poisson, over-dispersed, zero-inflated:
Note the right hand tail is extended relative to regular in the over-dispersed graph
Note the spike at 0 and the slight shift to the right to maintain the mean of 4 in the zero-inflated graph
We can determine whether a model fit is satisfactory using the residual deviance
The residual deviance can be assumed to be chi-squared distributed with the residual degrees of the freedom
The most common solution to overdispersion is using a negative binomial distribution
A negative binomial distribution is a poisson distribution with a parameter that is gamma distributed
Y ~ pois(λ)
But λ is not a constant – it comes from a gamma distribution
Yi ~ Negbin(mui, k)
If the ratio of the residual deviance to residual degrees of freedom is much more than one, then a poisson model is not likely to be capturing the over-dispersion and you should consider a negative binomial distribution instead
Having arrived at your preferred model through any means you judge appropriate, you can then test the significance of each explanatory variable through an LRT by dropping the term and comparing the LL with the model that retains it
To report this result:
– treatment had a positive effect on the number of leverets per hare (likelihood ratio test: 2ΔLL = 8.423, df = 2, p = 0.014)