Experimental Biology: Inferential Statistics and Hypothesis Testing

Introduction to Statistics and the Inferential Framework

Statistics is defined as a branch of mathematics concerned with the collection, organization, analysis, interpretation, and presentation of numerical data. It is divided into two primary categories:

Descriptive Statistics: This involves collecting, organizing, and describing data from a specific target population.
Inferential Statistics: This practice uses data from a sample to infer characteristics about a larger population. Because it is usually impossible to measure every individual in a population, inferential statistics provides the tools to draw conclusions or make predictions about the whole based on a subset. This process inherently involves hypothesis testing.

The Inferential Framework (Underwood 1990, 1991)

The inferential framework is a structured series of logical components used in research programmes to build scientific knowledge. The components include:

Observations: Identifying patterns in space and time (e.g., "Elephant species X seems to have larger ears in the warmer North than in the cooler South").
Models: Developing possible explanations or theories for the observations. These might consider factors like predation, temperature, or vegetation. Researchers often start with the most logical model based on existing knowledge.
Hypotheses: Predictions based on the models. A hypothesis is an idea or explanation tested through study, experimentation, and data analysis.
Null Hypotheses ( $H_0$ ): The specific statement put to the test, asserting that no difference or effect exists.
Experimental Design: The sampling scheme and statistical methods used to gather data.
Interpretation: Determining whether to falsify the null hypothesis or fail to reject it.

Conceptualizing Hypotheses with Examples

The Elephant Ear Example

Observation: Elephant populations of Species X have bigger ears in the warmer North than the cooler South. (Note: Larger surface area allows for more heat loss/cooling in warmer climates).
Research Question: Does average temperature (climate) affect the evolution of elephant ear size in different habitat zones?
Hypothesis: IF temperature affects ear size, THEN populations moving to and evolving in the same climate zone as other populations will evolve to have similar-sized ears.
Null Hypothesis ( $H_0$ ): IF average temperature DOES NOT affect ear size, THEN populations moving to a different climate zone will not evolve different ear sizes. ( $H_0$ assumes No Difference).
Alternative Hypothesis ( $H_1$ ): IF average temperature DOES affect ear size, THEN populations moving from a warmer to a cooler climate will evolve smaller ears (or vice-versa).

The Hedgehog Example (Village A vs. Village B)

Observation: Hedgehogs in Village A appear smaller than those in Village B.
Potential Bias/Models: Could differences be due to trap type (only catching small ones) or trap placement timing (catching young in Spring)?
Research Question: Are hedgehogs from Village A smaller than those from neighboring Village B?
Improved Hypothesis: IF we survey six sites in each village using the same traps and repeat this in each season for one year, THEN the measured size of hedgehogs will not differ between villages.
Statistical Hypotheses: * $H_0$ (Null): Size Village A $=$ Size Village B. * $H_1$ (Alternative): Size Village A $<$ Size Village B.

Specificity in Scientific Questions

A crucial part of the inferential framework is transforming poor, vague questions into better, specific questions that are practical and realistic to measure:

Poor: Are male and female mosquitoes different?
Better: Does the body size in mosquitoes differ between sexes?
Poor: Do enzymes work better at higher temperatures?
Better: Does the activity of DNA polymerase enzyme increase when temperature increases?
Poor: Do seagrasses promote biodiversity?
Better: Is fish species diversity higher in denser seagrasses?

Falsifiability and Hypothesis Testing Principles

The Law-Justice Analogy

Testing the Null Hypothesis ( $H_0$ ) first follows the principle of "innocent until proven guilty."

Scenario: Testing a new cancer drug.
Assumption: Start with the assumption ( $H_0$ ) that there is no difference between the drug and the control group.
The Trial: The statistical test acts as the jury. If evidence in favor of the drug is "convincing beyond a reasonable doubt," we reject the null hypothesis and accept the alternative ( $H_1$ ).
Threshold: "Beyond a reasonable doubt" is statistically represented by an alpha error (p-value) of $0.05$ or less. There is a small chance the result occurred by chance, but if the odds are tiny enough, we reject the null.

Key Definitions

Null Hypothesis ( $H_0$ ): States that a difference (statistical significance) does not exist between two or more populations.
Alternative Hypothesis ( $H_1$ or $H_a$ ): States that a phenomenon is occurring due to non-random causes.

Statistical Errors

Type I Error (False Positive): The null hypothesis is actually true, but the statistical test incorrectly indicates a difference exists. This is often considered more problematic (e.g., concluding a drug is effective when it is not, leading to deleterious effects).
Type II Error (False Negative): The null hypothesis is actually false, but the test fails to determine the difference as significant. This often occurs with small sample sizes.

Probability in Statistics

Foundations of Probability

Probability is the mathematical machinery used to analyze chance and quantify uncertainty. It provides the bridge between descriptive data and inferential interpretation.

Probability Range: A number between $0$ (impossibility) and $1$ (certainty).
Random Variable: A mathematical object representing the outcome of a random event (e.g., assigning $1$ for heads and $0$ for tails in a coin toss).

Probabilistic vs. Statistical Reasoning

Probabilistic Reasoning: Knowing the population and predicting a sample (e.g., knowing the proportion of shark species in a reef and calculating the probability that the first shark seen is a specifically a White Tip).
Statistical Reasoning: Observing a random sample to estimate the proportions of the whole unknown population.

P-Values

A statistical test determines the probability that the null hypothesis is true, expressed as a P-value.

If the probability ( $P$ ) is low, $H_0$ is rejected.
Standard cut-off: P < 0.05 (less than $5\%$ chance that the null is true).
Lower values ( $P = 0.001$ or $0.1\% \text{ chance}$ ) indicate even stronger evidence against the null than higher values ( $P = 0.1$ or $10\% \text{ chance}$ ).

Variance and Expectation

Expectation

The expectation of a random variable captures the center of the distribution. It is the average of many independent samples and is defined as the probability-weighted sum of all possible values.

Variance

Variance is the measurement of the spread or dispersion of data around the sample's mean.

Significance: Smaller variance indicates consistent data and stronger evidence for differences. Larger variance makes it harder to identify meaningful differences between groups.
Visualizing Significance: If the mean of Population B sits outside $95\%$ of the observations of Population A, there is less than $5\%$ probability (P < 0.05) that the difference occurred by chance.

Causes of Variance

Randomness: Chance events.
Biological Variation: Genetics, environment, etc.
Measurement Error: Inaccuracy from pipetting, imaging, or instruments.
Systematic Error (Bias): A consistent difference between the recorded value and the true value.

Sampling and Quality Control

Sample Size: More units increase accuracy, though this must be balanced with practicality (time, money).
Quality over Quantity: Poor-quality data increases error and reduces statistical power.
Representativeness: Samples must be random and representative of the total population to avoid bias. Tools include quadrats, random positioning, or systematic line sampling.
Timing: Sampling must account for season, time of day, and weather.

Control Groups in Research

A control group is a group examined in parallel to the treatment group to "remove" the effect of all factors except the one being investigated.

Experimental Controls: Controlling the environment (e.g., keeping light, temperature, and humidity constant while varying only $CO_2$ levels).
Procedural Controls: Ensuring the control group undergoes the exact same procedures as the treatment group, minus the active treatment itself. * Example: Group 1 (No drug), Group 2 (Drug), Group 3 (Placebo/dummy drug to control for the effect of the injection or tablet itself).
Statistical Controls: Instead of fixing factors, the alternate factors are measured and accounted for during analysis (e.g., measuring environmental temperature and using it in calculations to isolate the treatment effect).
Historical Controls: Comparing the current treatment group to historical data compiled when only one physical group exists.

Mathematical Calculation of Variance

To calculate the variance of a sample as an estimate for a population, follow these steps:

Calculate the Mean ( $\bar{x}$ ).
Find the Difference between each data point ( $x_i$ ) and the mean.
Square each difference (this ensures all values are positive and emphasizes larger deviations).
Sum the squared differences ( $\sum (x_i - \bar{x})^2$ ).
Divide by the number of data points minus one ( $n - 1$ ).

Formula for Sample Variance ( $\sigma^2$ ):

$\sigma^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}$

Visual Example Calculation: Data: $8, 5, 4, 3$

Mean ( $\bar{x}$ ): $\frac{8+5+4+3}{4} = 5$
Differences: $(8-5)=+3, (5-5)=0, (4-5)=-1, (3-5)=-2$
Squared Differences: $3^2 = 9, 0^2 = 0, (-1)^2 = 1, (-2)^2 = 4$
Sum of Squares ( $SS$ ): $9 + 0 + 1 + 4 = 14$
Variance ( $n-1$ ): $\frac{14}{4-1} = \frac{14}{3} \approx 4.67$