Statistical analysis for ecological design

Context

  • This is not a statistics module — stats are used here as a tool to support survey design.

  • The aim is to design and implement a scientifically sound survey and report, not to perform advanced mathematical analysis.

  • Statistical methods are used to test hypotheses and interpret ecological data.

🔑 Core message: Think about how you will analyse your data during the design stage — don’t wait until after data collection.


1. Statistical Tools for Ecological Analysis

Allowed Techniques

You’ll use familiar and straightforward methods:

  • ANOVA (Analysis of Variance) – compare means between groups.

  • Correlation – assess strength of relationship between two variables.

  • Regression – model relationships (one variable predicts another).

You can combine these methods if appropriate.

Most analyses will use:

  • One-way ANOVA

  • Two-way ANOVA (multi-factor)

  • Linear regression

  • Correlation (Pearson/Spearman)


2. Single-Factor vs Multi-Factor Designs

Single-Factor (One-Way)

  • One predictor variable (factor).

  • Example:
    Predictor = Shore height (3 levels: low, mid, high).
    Response = Algal diversity or abundance.

Goal → Compare means between levels (e.g. low vs mid vs high shore).

Multi-Factor (Two-Way)

  • Two predictor variables.

  • Example:

    • Factor 1 = Shore height (low, mid, high).

    • Factor 2 = Rock pools (inside vs outside).

Now the survey examines:

  • Main effects of each factor (shore height, pool presence).

  • Interaction effects (how one factor modifies the effect of the other).

→ Analysed using Two-Way ANOVA.


3. Comparing Averages

Examples of response variables:

  • Mean abundance of a species.

  • Mean size (e.g., of limpets).

  • Mean diversity index (e.g., Shannon diversity).

Examples of predictor variables:

  • Habitat type (sheltered vs exposed).

  • Shore height (low, mid, high).

  • Substrate heterogeneity.


When to Use Which Test

Situation

Test

Compare 2 means

t-test

Compare >2 means (1 factor)

One-way ANOVA

Compare 2+ factors

Two-way ANOVA

Data not normal / variances unequal

Non-parametric alternatives


4. Parametric vs Non-Parametric Tests

Parametric Tests

  • Require assumptions:

    1. Normal distribution of data.

    2. Homogenous variances (equal variance between groups).

    3. Independence of samples.

  • Examples: t-test, ANOVA.

  • Compare means.

  • More powerful (detect smaller effects).

Non-Parametric Tests

  • No assumption about data distribution (“distribution-free”).

  • Compare medians (use data ranks).

  • Examples:

    • Mann–Whitney U test (instead of t-test)

    • Kruskal–Wallis test (instead of one-way ANOVA)

  • Less powerful, but often more appropriate for biological data (which is messy).


5. Assumptions of ANOVA

  1. Independence

    • Each sample must be independent.

    • Example: don’t place quadrats right next to each other.

    • Violating independence invalidates tests.

  2. Normality

    • Data should roughly follow a normal distribution.

    • Lecturer notes: “I’ve never tested for normality — ANOVA is robust to slight non-normality” (following advice of Tony Underwood).

    • With small sample sizes (n < 100), focus on homogeneity rather than strict normality.

  3. Homogeneity of Variance

    • Variances between groups must be similar.

    • Test using:

      • Cochran’s test

      • Levene’s test (simpler)

    • If variances unequal → transform data (square root, log, or arcsine).

    • Retest. If still unequal → use non-parametric test.


6. What ANOVA Actually Does

Conceptual Breakdown

  • ANOVA compares variation between group means vs variation within groups (residuals).

Example:
  • Two treatments:

    • Group A mean = 10

    • Group B mean = 15

  • If little variation within groups → differences between groups are likely real.

  • If large variation within groups → may not be a true difference.

ANOVA calculates an F statistic:

F=Mean Square (factor)Mean Square (residual)F = \frac{\text{Mean Square (factor)}}{\text{Mean Square (residual)}}F=Mean Square (residual)Mean Square (factor)​

  • High F → greater difference between means relative to internal variation → significant result.


Interpreting Output

ANOVA table includes:

Source of Variation

df

Mean Square

F

p-value

Factor (e.g. Shore height)

Residual (Error)

  • Significant p (<0.05) → factor affects response.

  • If more than 2 levels, use post-hoc tests (e.g. Tukey’s test) to identify which groups differ.


7. Two-Way (Multi-Factor) ANOVA

Example

  • Factors:

    • Shore height (low, mid, high)

    • Wave exposure (sheltered vs exposed)

  • Response: Limpet density (number/m²)

Analysis

  • Tests:

    1. Effect of shore height

    2. Effect of wave exposure

    3. Interaction effect (does effect of exposure depend on height?)

Interpretation:

  • If an interaction is significant, it means:

    The effect of one variable depends on the other.

    e.g. Wave exposure affects limpet density only on the low shore, not mid/high.

Always report interactions first in your results section.

Example phrasing for report:

“There was a significant interaction between shore height and wave exposure on limpet density (Two-Way ANOVA, p<0.05). This was due to a strong effect of exposure on the low shore, but not at mid or high shore (Fig. 2).”


8. Continuous Relationships – Correlation & Regression

When to Use

Instead of comparing categories, sometimes you study relationships between continuous variables.

Examples:

  • % algal cover vs limpet density

  • Limpet size vs detachment strength

  • Rock pool size vs algal diversity

  • Seaweed length vs bladder number


Choosing Between Correlation and Regression

Question

Use

Description

Do both variables vary freely (no clear cause/effect)?

Correlation

Measures strength of relationship between two variables. No predictor/response distinction.

Is one variable clearly dependent on another (predictor → response)?

Regression

Models directional relationship. Allows prediction.

In regression:

  • X = independent/predictor variable

  • Y = dependent/response variable

Example:

Does rock pool size (X) affect algal diversity (Y)?


Regression Output

Report:

  1. Significance (p-value) – tests if slope ≠ 0

  2. Equation (e.g. Y = 3.2X + 5)

  3. R² value – proportion of variation in Y explained by X (0–1 scale)

Rules:

  • Only report regression equation if significant.

  • Higher R² → stronger predictive relationship.


9. Multi-Factor Regressions (Advanced Concepts)

ANCOVA (Analysis of Covariance)

  • Tests if two regression lines differ significantly in slope or intercept.

  • Used when comparing relationships across categories.

Example:

Does wave exposure affect the relationship between rock pool size and grazer density?

  • Independent variable: pool size (continuous)

  • Covariate/factor: exposure (categorical)

  • Response: grazer density

Simplified approach for coursework:
Plot both regressions (e.g. exposed vs sheltered) and compare qualitatively — no formal ANCOVA test required.


10. Practical & Reporting Guidance

Design Advice

  • Choose two predictor variables where possible for a richer analysis.

  • Plan for ANOVA or regression during survey design.

  • Ensure units are always labelled on axes (e.g. “Density (ind/m²)”).

Interpretation

  • Always check for interactions in multi-factor designs.

  • Avoid overcomplicating analysis — aim for clarity and coherence.

Key Phrases to Use

  • “Significant effect of [factor] on [response variable] (ANOVA, p<0.05).”

  • “No significant interaction between [factors] (Two-Way ANOVA, p>0.05).”

  • “Regression between [X] and [Y] was significant (p<0.05, R²=0.62).”


11. Summary Table

Goal

Test

Data Type

Notes

Compare 2 means

t-test

Categorical predictor

Parametric

Compare >2 means

One-way ANOVA

Categorical predictor

Use post-hoc tests

Compare 2+ factors

Two-way ANOVA

Two categorical predictors

Check for interactions

Data non-normal

Mann–Whitney / Kruskal–Wallis

Non-parametric

Use medians

Relationship (no cause)

Correlation

Continuous variables

No direction implied

Relationship (predictive)

Regression

Continuous predictor

Report R² and equation

Two regressions

ANCOVA (optional)

Continuous + categorical

Compare slopes qualitatively


Lecturer’s Key Advice

  • Focus on designing good surveys, not on complex maths.

  • Always think about your analysis plan before collecting data.

  • Keep methods simple but robust.

  • Avoid pseudoreplication and label all axes with units.

  • Two-factor designs and visual interpretation of interactions earn stronger marks.