Fisher's Exact Test Notes

Fisher's Exact Test

3.7 Fisher's Exact Test

  • This method calculates the exact probability of the \chi^2 statistic.
  • Usually used for 2x2 tables, but it can be applied to larger tables.

Fisher's Exact Test for 2x2 Cases

  • The test relies on the hypergeometric distribution.

The Hypergeometric Distribution

  • Suppose a box contains b blue marbles and r red marbles.

  • We perform n trials where a marble is chosen, its color noted, and then the marble is placed back (sampling with replacement).

  • Let X denote the number of blue marbles chosen in n trials; then, using the binomial distribution:
    P(X = x) = \binom{n}{x} p^x q^{n-x}

  • P(X=x) = \binom{n}{x} \frac{b}{b+r}^x \frac{r}{b+r}^{n-x} (3.6)

    • where p = \frac{b}{b+r} and q = 1-p = \frac{r}{b+r}
    • x=0,1,…,n
  • If we modify this so that we have sampling without replacement, then the probability of "success" on each trial differs, and we have:
    P(X=x) = \frac{\binom{b}{x} \binom{r}{n-x}}{\binom{b+r}{n}}
    x = Max(0, n-r), …, Min(n, b)
    (3.7)

  • Which is the probability mass function (PMF) of the hypergeometric distribution.

  • Let n = b + r, then b = np and r = nq.

  • P(X=x) = \frac{\binom{np}{x} \binom{nq}{n-x}}{\binom{n}{n}} (3.8)

    • with \mu = np and \sigma^2 = npq(\frac{n-n}{n-1}) (3.9)
  • Then we say X \sim hypergeometric(n, b, n).

  • If n is large compared with n, (3.8) reduces to (3.6) and (3.9) reduces to \mu = np and \sigma^2 = npq respectively.

  • I.e., under these conditions, sampling without replacement becomes the same as sampling with replacement.

Example 3.8

  • Box of 20 marbles, 12 red, 8 blue. Ten are taken without replacement. Find P(3 \text{ blue marbles}) .

  • Let X denote the number of blue marbles; then X \sim hypergeometric(n, b, n).

  • Note that
    b = 8
    r = 12
    n = 10
    n = 20
    np = \frac{n}{n} = \frac{8}{20} = 8
    nq = \frac{n}{n} = \frac{12}{20} = 12

  • Then
    P(X=3) = \frac{\binom{8}{3} \binom{12}{7}}{\binom{20}{10}} = \frac{\binom{8}{3} \binom{12}{7}}{\binom{20}{10}}
    = \frac{\frac{8!}{3!5!} \frac{12!}{7!5!}}{\frac{20!}{10!10!}} = \frac{56 * 792}{184756} = \frac{44352}{184756} \approx 0.24

Fisher's Exact Test for the 2x2 Case

  • Hypotheses:
    • H₀: there is no association between the two variables.
    • H₁: depends on the question.
Cat. A₁Cat. A₂Totals
Cat. B₁aba + b
Cat. B₂cdc + d
Totalsa + cb + dn
  • Corresponding expected frequencies do not satisfy the assumption.
  • Assuming H₀ and the given totals, the probability of obtaining this particular arrangement of data is calculated using the hypergeometric distribution, i.e.,
    • P(a, b, c, d) = \frac{\binom{a+b}{a} \binom{c+d}{c}}{\binom{n}{a+c}}
      P(a, b, c, d) = \frac{(a+b)!(c+d)!(a+c)!(b+d)!}{a!b!c!d!n!}
  • Fisher's Exact Test: use this formula to calculate exact probabilities of outcomes observed and more extreme than observed (depends on H₁). This gives us a p-value.
  • Note: "More extreme outcomes" must satisfy the row and column totals.