l5- Hypothesis Testing, Power, and P-Values in Statistics

Secrets to Success and Student Resources

Acknowledging Challenges: It is common for students to find certain aspects of the course difficult. Based on Quiz 2 reflections, three primary challenges identified by students include:
- Learning R and R Studio and keeping up with the fast pace of the labs.
- Understanding the differences between various probability distributions and determining when each is most relevant.
- Figuring out which variables are the response variables and which are the predictors.
Support Tools for R and R Studio:
- Lab Videos: These are available for students to code along at their own pace. Students are encouraged to slow down, pause, rewatch sections, and compare their code to the solutions posted on Learn.
- Debugging: Read error messages carefully to understand what they mean. Use Google and the "long list of tips and tricks" available on the BIO 209 Learn site.
Course Support Items:
- Lecture Notes: Posted alongside each lecture.
- Readings: Relevant chapters in the textbook and practice problems associated with each content set.
- Q&A Forum: Available for posting questions regarding any "muddiest point."
- Optional Tutorials: Wednesdays at $4:00\,PM$ .
- Office Hours: Tuesday at $3:00\,PM$ .

Learning Objectives

Understanding P-values and what they represent.
Learning the different types of statistical error: Type I ( $\alpha$ ) and Type II ( $\beta$ ).
Understanding statistical power and effect sizes ( $\delta$ ).
Learning how to test for statistical power.
Using distributions to test hypotheses.
Gaining familiarity with the R interface for hypothesis testing.

The Hypothesis Testing Framework

Role of Distributions: Probability distributions underlie the entire framework of hypothesis testing. For every null hypothesis ( $H_0$ ), there is a corresponding null distribution.
Components Needed to Test a Null Hypothesis:
- The Null Hypothesis ( $H_0$ ): The baseline assumption being tested.
- The Null Distribution: A mathematical representation of the null hypothesis. Common distributions include the Normal (Gaussian) and the T-distribution.
- The Confidence Level ( $\alpha$ ): Set to determine the acceptable false positive rate.
- Data: Measurements of the variables in a sample.
- Test Statistic: Relates the gathered data to the null distribution.

Understanding the Null Distribution and P-Values

The Null Distribution Graph:
- The x-axis represents the set of possible results.
- The y-axis represents the probability of each observed result.
- The Peak: Represents the true value under the null hypothesis; the most likely value to observe if $H_0$ is true.
- The Tails: Represent very unlikely observations, denoted as extreme values.
Calculating the P-Value:
- The p-value is the probability of the single observed value plus all cumulative probabilities more extreme than the observation, assuming the null hypothesis is true.
- "Extreme" refers to values further away from the center (peak) of the distribution.
- If the total sum of these probabilities (the p-value) is very small (e.g., $0.0000001$ ), it suggests the observation is unlikely under $H_0$ , providing evidence to reject the null.
P-Value Definition: The cumulative probability of the observation or more extreme observations under the null distribution.
Coin Analogy: If you suspect a coin is weighted and flip it $110$ times getting more heads than tails:
- The p-value does not tell you if the coin is fair.
- It tells you the probability that you would get at least as many heads as you did if the coin were fair.

Alpha, Beta, and Types of Error

Confidence Level ( $\alpha$ ): Also known as the acceptable false positive rate or false discovery rate. It defines the threshold for what is considered an "unlikely" observation.
- If p-value < $\alpha$ , we reject the null hypothesis ( $H_0$ ).
- If we set $\alpha = 0.05$ , we accept being wrong $5\%$ of the time when the null hypothesis is actually true.
Type I Error ( $\alpha$ ): A false positive. The incorrect rejection of a true null hypothesis.
Type II Error ( $\beta$ ): A false negative. Inaccurately retaining (failing to reject) a null hypothesis that is actually false.
Analogy (Pregnancy):
- General: Type I is telling someone they are pregnant when they cannot be; Type II is telling someone who is eight months pregnant that they are not.
- Seahorses (Biological Context): Type I would be telling a female seahorse she is pregnant; Type II would be telling a visibly pregnant male seahorse he is not.
The Conflict Table:
- Correct Outcome 1: Truth is no association, study detects no association.
- Correct Outcome 2 (Power): Truth is association, study detects association.
- Type I Error: Truth is no association, but study detects an association.
- Type II Error: Truth is an association, but study detects no association.

Statistical Power and Effect Size

Statistical Power: Defined as $1 - \beta$ . It is the probability that a test will correctly reject the null hypothesis when it is actually false.
- High power reduces the rate of Type II errors.
Effect Size ( $\delta$ ): The actual magnitude of the correlation or difference being tested.
Factors Influencing Power:
- Sample Size ( $n$ ): Power increases as sample size increases, eventually reaching a plateau. For example, to reach $80\%$ power ( $0.8$ ) in a specific test, a sample size of nearly $200$ might be needed.
- The Magnitude of the Effect: If two means are very close (small effect size), a larger sample is required to detect a difference. If they are very different (large effect size, such as comparing the height of toddlers to the height of adults), a smaller sample size can still result in high power.
Importance in Design: Statistics must be considered during the experimental design phase to ensure the study has sufficient power to detect the effects being investigated.

Power Analysis in R

Library Requirement: The pwr library is used for conducting power analysis.
One-Sample T-test: Compares a sample mean to a known expected value.
R Code Example (Power Calculation):
- Loading the library: library(pwr)
- Running the analysis for a sample of $50$ with an effect size of $0.3$ : pwr.t.test(n = 50, d = 0.3, sig.level = 0.05, type = "one.sample", alternative = "two.sided")
- This result yields a power of approximately $0.547$ , meaning there is less than a $55\%$ chance of correctly detecting a true difference.
R Code Example (Required Sample Size):
- To find the sample size required for $80\%$ power: pwr.t.test(power = 0.8, d = 0.3, sig.level = 0.05, type = "one.sample", alternative = "two.sided")
- The output indicates a required sample size ( $n$ ) of $90$ (rounded up from $89$ ).

Application: Plastic in Fish

Scenario: $1$ in $3$ fish caught for human consumption contain plastic.
Hypotheses:
- Null ( $H_0$ ): Fish contain plastic at random ( $p = 0.5$ ).
- Alternative ( $H_1$ ): Fish are more or less likely to contain plastic than random chance predicts.
Binomial Functions in R:
- dbinom(x, size, prob): Finds the probability of a specific observed value ( $d$ stands for density).
- pbinom(q, size, prob, lower.tail): Finds the cumulative probability ( $p$ stands for p-value/probability distribution).
- qbinom(): Finds the quantile of the distribution.
- rbinom(): Randomly generates numbers under the binomial distribution.
Calculating the P-Value for 1 in 3 Fish:
- P-value = Probability of $1$ fish + Probability of $0$ fish (lower tail).
- Code: dbinom(x = 1, size = 3, prob = 0.5) + dbinom(x = 0, size = 3, prob = 0.5) results in $0.5$ .
- Cumulative version: pbinom(q = 1, size = 3, prob = 0.5, lower.tail = TRUE) also results in $0.5$ .
Homework Challenge: Calculate the p-value for $20$ out of $60$ fish containing plastic.