Statistical analysis for ecological design

Context

This is not a statistics module — stats are used here as a tool to support survey design.
The aim is to design and implement a scientifically sound survey and report, not to perform advanced mathematical analysis.
Statistical methods are used to test hypotheses and interpret ecological data.

🔑 Core message: Think about how you will analyse your data during the design stage — don’t wait until after data collection.

1. Statistical Tools for Ecological Analysis

Allowed Techniques

You’ll use familiar and straightforward methods:

ANOVA (Analysis of Variance) – compare means between groups.
Correlation – assess strength of relationship between two variables.
Regression – model relationships (one variable predicts another).

You can combine these methods if appropriate.

Most analyses will use:

One-way ANOVA
Two-way ANOVA (multi-factor)
Linear regression
Correlation (Pearson/Spearman)

2. Single-Factor vs Multi-Factor Designs

Single-Factor (One-Way)

One predictor variable (factor).
Example:
Predictor = Shore height (3 levels: low, mid, high).
Response = Algal diversity or abundance.

Goal → Compare means between levels (e.g. low vs mid vs high shore).

Multi-Factor (Two-Way)

Two predictor variables.
Example:
- Factor 1 = Shore height (low, mid, high).
- Factor 2 = Rock pools (inside vs outside).

Now the survey examines:

Main effects of each factor (shore height, pool presence).
Interaction effects (how one factor modifies the effect of the other).

→ Analysed using Two-Way ANOVA.

3. Comparing Averages

Examples of response variables:

Mean abundance of a species.
Mean size (e.g., of limpets).
Mean diversity index (e.g., Shannon diversity).

Examples of predictor variables:

Habitat type (sheltered vs exposed).
Shore height (low, mid, high).
Substrate heterogeneity.

When to Use Which Test

Situation	Test
Compare 2 means	t-test
Compare >2 means (1 factor)	One-way ANOVA
Compare 2+ factors	Two-way ANOVA
Data not normal / variances unequal	Non-parametric alternatives

4. Parametric vs Non-Parametric Tests

Parametric Tests

Require assumptions:
1. Normal distribution of data.
2. Homogenous variances (equal variance between groups).
3. Independence of samples.
Examples: t-test, ANOVA.
Compare means.
More powerful (detect smaller effects).

Non-Parametric Tests

No assumption about data distribution (“distribution-free”).
Compare medians (use data ranks).
Examples:
- Mann–Whitney U test (instead of t-test)
- Kruskal–Wallis test (instead of one-way ANOVA)
Less powerful, but often more appropriate for biological data (which is messy).

5. Assumptions of ANOVA

Independence
- Each sample must be independent.
- Example: don’t place quadrats right next to each other.
- Violating independence invalidates tests.
Normality
- Data should roughly follow a normal distribution.
- Lecturer notes: “I’ve never tested for normality — ANOVA is robust to slight non-normality” (following advice of Tony Underwood).
- With small sample sizes (n < 100), focus on homogeneity rather than strict normality.
Homogeneity of Variance
- Variances between groups must be similar.
- Test using:
  - Cochran’s test
  - Levene’s test (simpler)
- If variances unequal → transform data (square root, log, or arcsine).
- Retest. If still unequal → use non-parametric test.

6. What ANOVA Actually Does

Conceptual Breakdown

ANOVA compares variation between group means vs variation within groups (residuals).

Example:

Two treatments:
- Group A mean = 10
- Group B mean = 15
If little variation within groups → differences between groups are likely real.
If large variation within groups → may not be a true difference.

ANOVA calculates an F statistic:

F=Mean Square (factor)Mean Square (residual)F = \frac{\text{Mean Square (factor)}}{\text{Mean Square (residual)}}F=Mean Square (residual)Mean Square (factor)

High F → greater difference between means relative to internal variation → significant result.

Interpreting Output

ANOVA table includes:

Source of Variation	df	Mean Square	F	p-value
Factor (e.g. Shore height)	–	–	–	–
Residual (Error)	–	–	–	–

Significant p (<0.05) → factor affects response.
If more than 2 levels, use post-hoc tests (e.g. Tukey’s test) to identify which groups differ.

7. Two-Way (Multi-Factor) ANOVA

Example

Factors:
- Shore height (low, mid, high)
- Wave exposure (sheltered vs exposed)
Response: Limpet density (number/m²)

Analysis

Tests:
1. Effect of shore height
2. Effect of wave exposure
3. Interaction effect (does effect of exposure depend on height?)

Interpretation:

If an interaction is significant, it means:
The effect of one variable depends on the other.
e.g. Wave exposure affects limpet density only on the low shore, not mid/high.

Always report interactions first in your results section.

Example phrasing for report:

“There was a significant interaction between shore height and wave exposure on limpet density (Two-Way ANOVA, p<0.05). This was due to a strong effect of exposure on the low shore, but not at mid or high shore (Fig. 2).”

8. Continuous Relationships – Correlation & Regression

When to Use

Instead of comparing categories, sometimes you study relationships between continuous variables.

Examples:

% algal cover vs limpet density
Limpet size vs detachment strength
Rock pool size vs algal diversity
Seaweed length vs bladder number

Choosing Between Correlation and Regression

Question	Use	Description
Do both variables vary freely (no clear cause/effect)?	Correlation	Measures strength of relationship between two variables. No predictor/response distinction.
Is one variable clearly dependent on another (predictor → response)?	Regression	Models directional relationship. Allows prediction.

In regression:

X = independent/predictor variable
Y = dependent/response variable

Example:

Does rock pool size (X) affect algal diversity (Y)?

Regression Output

Report:

Significance (p-value) – tests if slope ≠ 0
Equation (e.g. Y = 3.2X + 5)
R² value – proportion of variation in Y explained by X (0–1 scale)

Rules:

Only report regression equation if significant.
Higher R² → stronger predictive relationship.

9. Multi-Factor Regressions (Advanced Concepts)

ANCOVA (Analysis of Covariance)

Tests if two regression lines differ significantly in slope or intercept.
Used when comparing relationships across categories.

Example:

Does wave exposure affect the relationship between rock pool size and grazer density?

Independent variable: pool size (continuous)
Covariate/factor: exposure (categorical)
Response: grazer density

Simplified approach for coursework:
Plot both regressions (e.g. exposed vs sheltered) and compare qualitatively — no formal ANCOVA test required.

10. Practical & Reporting Guidance

Design Advice

Choose two predictor variables where possible for a richer analysis.
Plan for ANOVA or regression during survey design.
Ensure units are always labelled on axes (e.g. “Density (ind/m²)”).

Interpretation

Always check for interactions in multi-factor designs.
Avoid overcomplicating analysis — aim for clarity and coherence.

Key Phrases to Use

“Significant effect of [factor] on [response variable] (ANOVA, p<0.05).”
“No significant interaction between [factors] (Two-Way ANOVA, p>0.05).”
“Regression between [X] and [Y] was significant (p<0.05, R²=0.62).”

11. Summary Table

Goal	Test	Data Type	Notes
Compare 2 means	t-test	Categorical predictor	Parametric
Compare >2 means	One-way ANOVA	Categorical predictor	Use post-hoc tests
Compare 2+ factors	Two-way ANOVA	Two categorical predictors	Check for interactions
Data non-normal	Mann–Whitney / Kruskal–Wallis	Non-parametric	Use medians
Relationship (no cause)	Correlation	Continuous variables	No direction implied
Relationship (predictive)	Regression	Continuous predictor	Report R² and equation
Two regressions	ANCOVA (optional)	Continuous + categorical	Compare slopes qualitatively

Lecturer’s Key Advice

Focus on designing good surveys, not on complex maths.
Always think about your analysis plan before collecting data.
Keep methods simple but robust.
Avoid pseudoreplication and label all axes with units.
Two-factor designs and visual interpretation of interactions earn stronger marks.