Statistical analysis for ecological design
Context
This is not a statistics module — stats are used here as a tool to support survey design.
The aim is to design and implement a scientifically sound survey and report, not to perform advanced mathematical analysis.
Statistical methods are used to test hypotheses and interpret ecological data.
🔑 Core message: Think about how you will analyse your data during the design stage — don’t wait until after data collection.
1. Statistical Tools for Ecological Analysis
Allowed Techniques
You’ll use familiar and straightforward methods:
ANOVA (Analysis of Variance) – compare means between groups.
Correlation – assess strength of relationship between two variables.
Regression – model relationships (one variable predicts another).
You can combine these methods if appropriate.
Most analyses will use:
One-way ANOVA
Two-way ANOVA (multi-factor)
Linear regression
Correlation (Pearson/Spearman)
2. Single-Factor vs Multi-Factor Designs
Single-Factor (One-Way)
One predictor variable (factor).
Example:
Predictor = Shore height (3 levels: low, mid, high).
Response = Algal diversity or abundance.
Goal → Compare means between levels (e.g. low vs mid vs high shore).
Multi-Factor (Two-Way)
Two predictor variables.
Example:
Factor 1 = Shore height (low, mid, high).
Factor 2 = Rock pools (inside vs outside).
Now the survey examines:
Main effects of each factor (shore height, pool presence).
Interaction effects (how one factor modifies the effect of the other).
→ Analysed using Two-Way ANOVA.
3. Comparing Averages
Examples of response variables:
Mean abundance of a species.
Mean size (e.g., of limpets).
Mean diversity index (e.g., Shannon diversity).
Examples of predictor variables:
Habitat type (sheltered vs exposed).
Shore height (low, mid, high).
Substrate heterogeneity.
When to Use Which Test
Situation | Test |
|---|---|
Compare 2 means | t-test |
Compare >2 means (1 factor) | One-way ANOVA |
Compare 2+ factors | Two-way ANOVA |
Data not normal / variances unequal | Non-parametric alternatives |
4. Parametric vs Non-Parametric Tests
Parametric Tests
Require assumptions:
Normal distribution of data.
Homogenous variances (equal variance between groups).
Independence of samples.
Examples: t-test, ANOVA.
Compare means.
More powerful (detect smaller effects).
Non-Parametric Tests
No assumption about data distribution (“distribution-free”).
Compare medians (use data ranks).
Examples:
Mann–Whitney U test (instead of t-test)
Kruskal–Wallis test (instead of one-way ANOVA)
Less powerful, but often more appropriate for biological data (which is messy).
5. Assumptions of ANOVA
Independence
Each sample must be independent.
Example: don’t place quadrats right next to each other.
Violating independence invalidates tests.
Normality
Data should roughly follow a normal distribution.
Lecturer notes: “I’ve never tested for normality — ANOVA is robust to slight non-normality” (following advice of Tony Underwood).
With small sample sizes (n < 100), focus on homogeneity rather than strict normality.
Homogeneity of Variance
Variances between groups must be similar.
Test using:
Cochran’s test
Levene’s test (simpler)
If variances unequal → transform data (square root, log, or arcsine).
Retest. If still unequal → use non-parametric test.
6. What ANOVA Actually Does
Conceptual Breakdown
ANOVA compares variation between group means vs variation within groups (residuals).
Example:
Two treatments:
Group A mean = 10
Group B mean = 15
If little variation within groups → differences between groups are likely real.
If large variation within groups → may not be a true difference.
ANOVA calculates an F statistic:
F=Mean Square (factor)Mean Square (residual)F = \frac{\text{Mean Square (factor)}}{\text{Mean Square (residual)}}F=Mean Square (residual)Mean Square (factor)
High F → greater difference between means relative to internal variation → significant result.
Interpreting Output
ANOVA table includes:
Source of Variation | df | Mean Square | F | p-value |
|---|---|---|---|---|
Factor (e.g. Shore height) | – | – | – | – |
Residual (Error) | – | – | – | – |
Significant p (<0.05) → factor affects response.
If more than 2 levels, use post-hoc tests (e.g. Tukey’s test) to identify which groups differ.
7. Two-Way (Multi-Factor) ANOVA
Example
Factors:
Shore height (low, mid, high)
Wave exposure (sheltered vs exposed)
Response: Limpet density (number/m²)
Analysis
Tests:
Effect of shore height
Effect of wave exposure
Interaction effect (does effect of exposure depend on height?)
Interpretation:
If an interaction is significant, it means:
The effect of one variable depends on the other.
e.g. Wave exposure affects limpet density only on the low shore, not mid/high.
Always report interactions first in your results section.
Example phrasing for report:
“There was a significant interaction between shore height and wave exposure on limpet density (Two-Way ANOVA, p<0.05). This was due to a strong effect of exposure on the low shore, but not at mid or high shore (Fig. 2).”
8. Continuous Relationships – Correlation & Regression
When to Use
Instead of comparing categories, sometimes you study relationships between continuous variables.
Examples:
% algal cover vs limpet density
Limpet size vs detachment strength
Rock pool size vs algal diversity
Seaweed length vs bladder number
Choosing Between Correlation and Regression
Question | Use | Description |
|---|---|---|
Do both variables vary freely (no clear cause/effect)? | Correlation | Measures strength of relationship between two variables. No predictor/response distinction. |
Is one variable clearly dependent on another (predictor → response)? | Regression | Models directional relationship. Allows prediction. |
In regression:
X = independent/predictor variable
Y = dependent/response variable
Example:
Does rock pool size (X) affect algal diversity (Y)?
Regression Output
Report:
Significance (p-value) – tests if slope ≠ 0
Equation (e.g. Y = 3.2X + 5)
R² value – proportion of variation in Y explained by X (0–1 scale)
Rules:
Only report regression equation if significant.
Higher R² → stronger predictive relationship.
9. Multi-Factor Regressions (Advanced Concepts)
ANCOVA (Analysis of Covariance)
Tests if two regression lines differ significantly in slope or intercept.
Used when comparing relationships across categories.
Example:
Does wave exposure affect the relationship between rock pool size and grazer density?
Independent variable: pool size (continuous)
Covariate/factor: exposure (categorical)
Response: grazer density
Simplified approach for coursework:
Plot both regressions (e.g. exposed vs sheltered) and compare qualitatively — no formal ANCOVA test required.
10. Practical & Reporting Guidance
Design Advice
Choose two predictor variables where possible for a richer analysis.
Plan for ANOVA or regression during survey design.
Ensure units are always labelled on axes (e.g. “Density (ind/m²)”).
Interpretation
Always check for interactions in multi-factor designs.
Avoid overcomplicating analysis — aim for clarity and coherence.
Key Phrases to Use
“Significant effect of [factor] on [response variable] (ANOVA, p<0.05).”
“No significant interaction between [factors] (Two-Way ANOVA, p>0.05).”
“Regression between [X] and [Y] was significant (p<0.05, R²=0.62).”
11. Summary Table
Goal | Test | Data Type | Notes |
|---|---|---|---|
Compare 2 means | t-test | Categorical predictor | Parametric |
Compare >2 means | One-way ANOVA | Categorical predictor | Use post-hoc tests |
Compare 2+ factors | Two-way ANOVA | Two categorical predictors | Check for interactions |
Data non-normal | Mann–Whitney / Kruskal–Wallis | Non-parametric | Use medians |
Relationship (no cause) | Correlation | Continuous variables | No direction implied |
Relationship (predictive) | Regression | Continuous predictor | Report R² and equation |
Two regressions | ANCOVA (optional) | Continuous + categorical | Compare slopes qualitatively |
Lecturer’s Key Advice
Focus on designing good surveys, not on complex maths.
Always think about your analysis plan before collecting data.
Keep methods simple but robust.
Avoid pseudoreplication and label all axes with units.
Two-factor designs and visual interpretation of interactions earn stronger marks.