Fisher's Exact Test Notes
Fisher's Exact Test
3.7 Fisher's Exact Test
- This method calculates the exact probability of the \chi^2 statistic.
- Usually used for 2x2 tables, but it can be applied to larger tables.
Fisher's Exact Test for 2x2 Cases
- The test relies on the hypergeometric distribution.
The Hypergeometric Distribution
Suppose a box contains b blue marbles and r red marbles.
We perform n trials where a marble is chosen, its color noted, and then the marble is placed back (sampling with replacement).
Let X denote the number of blue marbles chosen in n trials; then, using the binomial distribution:
P(X = x) = \binom{n}{x} p^x q^{n-x}P(X=x) = \binom{n}{x} \frac{b}{b+r}^x \frac{r}{b+r}^{n-x} (3.6)
- where p = \frac{b}{b+r} and q = 1-p = \frac{r}{b+r}
- x=0,1,…,n
If we modify this so that we have sampling without replacement, then the probability of "success" on each trial differs, and we have:
P(X=x) = \frac{\binom{b}{x} \binom{r}{n-x}}{\binom{b+r}{n}}
x = Max(0, n-r), …, Min(n, b)
(3.7)Which is the probability mass function (PMF) of the hypergeometric distribution.
Let n = b + r, then b = np and r = nq.
P(X=x) = \frac{\binom{np}{x} \binom{nq}{n-x}}{\binom{n}{n}} (3.8)
- with \mu = np and \sigma^2 = npq(\frac{n-n}{n-1}) (3.9)
Then we say X \sim hypergeometric(n, b, n).
If n is large compared with n, (3.8) reduces to (3.6) and (3.9) reduces to \mu = np and \sigma^2 = npq respectively.
I.e., under these conditions, sampling without replacement becomes the same as sampling with replacement.
Example 3.8
Box of 20 marbles, 12 red, 8 blue. Ten are taken without replacement. Find P(3 \text{ blue marbles}) .
Let X denote the number of blue marbles; then X \sim hypergeometric(n, b, n).
Note that
b = 8
r = 12
n = 10
n = 20
np = \frac{n}{n} = \frac{8}{20} = 8
nq = \frac{n}{n} = \frac{12}{20} = 12Then
P(X=3) = \frac{\binom{8}{3} \binom{12}{7}}{\binom{20}{10}} = \frac{\binom{8}{3} \binom{12}{7}}{\binom{20}{10}}
= \frac{\frac{8!}{3!5!} \frac{12!}{7!5!}}{\frac{20!}{10!10!}} = \frac{56 * 792}{184756} = \frac{44352}{184756} \approx 0.24
Fisher's Exact Test for the 2x2 Case
- Hypotheses:
- H₀: there is no association between the two variables.
- H₁: depends on the question.
| Cat. A₁ | Cat. A₂ | Totals | |
|---|---|---|---|
| Cat. B₁ | a | b | a + b |
| Cat. B₂ | c | d | c + d |
| Totals | a + c | b + d | n |
- Corresponding expected frequencies do not satisfy the assumption.
- Assuming H₀ and the given totals, the probability of obtaining this particular arrangement of data is calculated using the hypergeometric distribution, i.e.,
- P(a, b, c, d) = \frac{\binom{a+b}{a} \binom{c+d}{c}}{\binom{n}{a+c}}
P(a, b, c, d) = \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{a!b!c!d!n!}
- P(a, b, c, d) = \frac{\binom{a+b}{a} \binom{c+d}{c}}{\binom{n}{a+c}}
- Fisher's Exact Test: use this formula to calculate exact probabilities of outcomes observed and more extreme than observed (depends on H₁). This gives us a p-value.
- Note: "More extreme outcomes" must satisfy the row and column totals.