Experimental Design Flashcards

Announcements

  • No lab classes on Thursday and Friday next week due to the Easter long weekend (Good Friday).
  • Tutorials are also postponed to week seven (after the mid-semester break).
  • Project two will be released next week.
    • It builds upon the current projects.
    • It is worth more.
    • It will involve real (messy) data.
    • Starting early is recommended due to its complexity.

Experimental Design

  • Focus: Understanding basic experimental designs and statistical tests for analysis.
  • Goal: Equip students with tools for project two.
  • Going a bit backwards to move forward is a way to say that sometimes we need to revisit foundational concepts to make progress.

Key Concepts

Experimental Units vs. Sampling Units

  • Importance: Crucial for determining replication, sample size, and correct data analysis.
  • Experimental Unit: The smallest unit to which a treatment is applied.
    • Examples: single animal, plants in a square meter plot, a farm, a catchment.
  • Sampling Unit: The smallest unit from which data (observations, Y variable) are collected.
  • Note: These can be the same, but often differ in complex designs.
    • If you get the Experimental Units vs. Sampling Units wrong, your whole experiment is wrong.

Replication (n/Sample Size)

  • Needed to estimate variation within the system.
  • Enables calculation of standard errors and means.
  • Increasing replicates generally leads to smaller standard errors and more precise results.
  • Trade-off: Higher replication increases costs, resources, and time.
  • Balance: Finding the equilibrium between accuracy and practicality.
  • Replication is necessary to test every hypothesis. So every hypothesis you make needs to be able to have application to test it.

How Many Replicates?

  • Answer depends on the system being studied, costs, and desired statistical power.
  • Use the provided shiny app (interactive R code) to explore the relationship between replication and standard error.
  • The link to the shiny app is provided.

Shiny App Demonstration

  • Population Parameters: Assuming a normal distribution, defined by mean and variance.
  • Sliders: Adjust standard deviation (spread of data) and number of observations (replicates).
  • Observation: As the number of observations increases, the standard error of the mean (pink bar) decreases.
  • With higher sample sizes, you get more data points to estimate the histogram which is estimating what our population data looks like.
  • There are diminishing returns from increasing sample sizes, because of the square root term.

Standard Error of the Mean Equation

  • The standard error of the mean is calculated as: SEM = \frac{\sigma}{\sqrt{n}}, where \sigma is the population standard deviation and n is the sample size (number of replicates).
  • Diminishing Returns: Increasing replication has less impact on standard error due to the square root of n in the denominator.
  • Pilot Studies: Conduct small-scale experiments to estimate population variation before designing the full experiment.

Key Takeaways from Shiny App Exploration

  • When fixing the number of replicates (i.e. 10), the increase of variation in your population causes the standard error of the mean to go up.
  • When you increase the number of replicates, you're improving the percision; causing the standard error of the mean to decrease.
  • If you have a population that is really variable, chances are aiming for high replication is going to benefit quite a lot.
  • If you have a population that doesn't have much variation, then you can get away with fewer replicates.

Pseudo Replication and Confounding

  • Pseudo replication: Artificially inflating the number of replicates, often due to mismatching experimental and sampling units. You end up looking like you have more replicates than you actually do.
  • Confounding: Inability to distinguish treatment effects from other uncontrolled factors.
  • Experimental designs must account for these issues. So in order, to avoid these is to account for it in the design of the experiment.

Examples: Experimental Units vs. Sampling Units

Example 1: Organic vs. Conventional Agriculture and Sheep Weight Gain

  • Hypothesis: Organic agriculture leads to greater weight gain in sheep compared to conventional agriculture.
  • Setup: 40 animals divided into two groups of 20 on separate farms (one organic, one conventional).
  • Sampling Unit: Individual sheep (weight measurement).
  • Initial Experimental Unit: Individual sheep (based on limited info).
  • Problem: Farm as the true experimental unit due to potential confounding variables between the two farms.
    • Possible Confounding Factors: Different farm management, soil moisture, rainfall, fire histories, fertilizer applications, etc.
    • Solution: Design experiment with multiple farms per treatment to represent true replication.

Example 2: Glasshouse Irrigation Rates and Nitrogen Concentration in Plants

  • Hypothesis: Different irrigation rates lead to different nitrogen concentrations in plants.
  • Setup: Glasshouse experiment with different water levels for irrigation.
  • Sampling Unit: Individual leaf (nitrogen concentration measurement).
  • Experimental Unit: Whole plant (treatment applied to the plant, leaves on the same plant are correlated).
    • Taking five leaves off each plant to get a better measurement of the variability of the nitrogen concentration, in each of the individual plants.

Example 3: Animal House, Fish Growth Rate, and Types of Food

  • Hypothesis: Fish food affects growth rate.
  • Setup: 100 fish divided into two tanks (50 each), one tank gets food A, and the other tank gets food B.
  • Sampling Unit: Individual fish (growth rate measurement).
  • Initial Experimental Unit: Individual fish (setup as described).
  • Potential Confounding: Tank environment (pH, temperature gradients) affecting growth rates.
  • Improvement: Use multiple tanks per treatment, randomly distributed to account for environmental variations.

Randomization

  • Ensures independence of error terms and avoids bias.
  • Without randomization, treatment effects cannot be disentangled from bias.

Example: Drug Company, New Pesticide, and Nematode Control

  • Scenario: Testing a new pesticide (nematode control) on farms.
  • Bias: List of farms is ordered by age (older farms at the top).
  • Non-Random Assignment: Dividing the list in half leads to treatment A being applied to older farms and treatment B to newer farms.
  • Consequence: Confounding between pesticide effect and farm age/nematode load.
  • Solution: Randomize the list of farms before assigning treatments.

Implementing Randomization

  • Humans are poor randomizers; use tools like coin flips, random number tables, or computer-generated random numbers.
  • R packages that generate a randomized experimental design and print it out. Then take the print out to the lab, and set up the experiment based on that.

Controlling Sources of Variation

  • Goal: Explain variation through treatments and control other potential sources of variation.
  • Unexplained variation ends up in the residual (error) term.
  • Controlling variation leads to stronger, more powerful experimental designs.

Completely Randomized Designs

  • Simplest design: Randomly allocate treatments among experimental units.
  • Sampling unit = experimental unit.
  • Limitations: Requires similar experimental units; otherwise, precision decreases, and conclusions may be incorrect.
  • If there is another source of variation, it will end up in the residual term -- error.

ANOVA Tables

  • They are capturing all this information.

Bigger Residuals and Smaller F Ratios

  • With the ANOVA table, you end up with bigger sums of squares in your residuals.
  • Mean squared residual is larger.
  • F ratios are smaller. Equation: F = \frac{MSTR}{MSE}
  • Weight of evidence gets reduced.
  • P values are larger.
  • Power of the experiment decreases.

Example: Paddock Experiment and Crop Yield

  • Setup: Randomly assign three treatments (A, B, C) across a field.
  • Unaccounted Variation: Moisture gradient due to a creek on one side of the field.
  • Consequence: Treatment A (by chance) is located near the creek, confounding the treatment effect with moisture content.

Blocking

  • Technique to control for known sources of variation.
Example: Moisture Gradient and Blocking
  • Concept: Divide the field into blocks, where each block is expected to have similar moisture content.
  • Randomized Complete Block Design: Within each block, randomly assign all three treatments (A, B, C).
  • Complete Block: Each of our three treatments are in each of the blocks.
  • Each block should have a similar moisture content.

ANOVA Model

  • Total Variation = Treatment Sums of Squares + Residual Sums of Squares (for completely randomized designs).
  • Total Variation = Treatment Sums of Squares + Blocking Term + Residual Sums of Squares (for blocking designs).
    • Blocking term explains variation between blocks (e.g., due to soil moisture).
    • Residual sums of squares represent variation within each block after accounting for treatment and blocking effects.

Fish blocking example

  • Scenario: In a warm and cool setting, put diet one and diet two next to each other and continue through the blocks. It's getting cooler as we walk away, but the two tanks here are under the same temperature regine.
  • Concept: We're controlling for that change in in temperature gradient.