No lab classes on Thursday and Friday next week due to the Easter long weekend (Good Friday).
Tutorials are also postponed to week seven (after the mid-semester break).
Project two will be released next week.
It builds upon the current projects.
It is worth more.
It will involve real (messy) data.
Starting early is recommended due to its complexity.
Experimental Design
Focus: Understanding basic experimental designs and statistical tests for analysis.
Goal: Equip students with tools for project two.
Going a bit backwards to move forward is a way to say that sometimes we need to revisit foundational concepts to make progress.
Key Concepts
Experimental Units vs. Sampling Units
Importance: Crucial for determining replication, sample size, and correct data analysis.
Experimental Unit: The smallest unit to which a treatment is applied.
Examples: single animal, plants in a square meter plot, a farm, a catchment.
Sampling Unit: The smallest unit from which data (observations, Y variable) are collected.
Note: These can be the same, but often differ in complex designs.
If you get the Experimental Units vs. Sampling Units wrong, your whole experiment is wrong.
Replication (n/Sample Size)
Needed to estimate variation within the system.
Enables calculation of standard errors and means.
Increasing replicates generally leads to smaller standard errors and more precise results.
Trade-off: Higher replication increases costs, resources, and time.
Balance: Finding the equilibrium between accuracy and practicality.
Replication is necessary to test every hypothesis. So every hypothesis you make needs to be able to have application to test it.
How Many Replicates?
Answer depends on the system being studied, costs, and desired statistical power.
Use the provided shiny app (interactive R code) to explore the relationship between replication and standard error.
The link to the shiny app is provided.
Shiny App Demonstration
Population Parameters: Assuming a normal distribution, defined by mean and variance.
Sliders: Adjust standard deviation (spread of data) and number of observations (replicates).
Observation: As the number of observations increases, the standard error of the mean (pink bar) decreases.
With higher sample sizes, you get more data points to estimate the histogram which is estimating what our population data looks like.
There are diminishing returns from increasing sample sizes, because of the square root term.
Standard Error of the Mean Equation
The standard error of the mean is calculated as: SEM = \frac{\sigma}{\sqrt{n}}, where \sigma is the population standard deviation and n is the sample size (number of replicates).
Diminishing Returns: Increasing replication has less impact on standard error due to the square root of n in the denominator.
Pilot Studies: Conduct small-scale experiments to estimate population variation before designing the full experiment.
Key Takeaways from Shiny App Exploration
When fixing the number of replicates (i.e. 10), the increase of variation in your population causes the standard error of the mean to go up.
When you increase the number of replicates, you're improving the percision; causing the standard error of the mean to decrease.
If you have a population that is really variable, chances are aiming for high replication is going to benefit quite a lot.
If you have a population that doesn't have much variation, then you can get away with fewer replicates.
Pseudo Replication and Confounding
Pseudo replication: Artificially inflating the number of replicates, often due to mismatching experimental and sampling units. You end up looking like you have more replicates than you actually do.
Confounding: Inability to distinguish treatment effects from other uncontrolled factors.
Experimental designs must account for these issues. So in order, to avoid these is to account for it in the design of the experiment.
Examples: Experimental Units vs. Sampling Units
Example 1: Organic vs. Conventional Agriculture and Sheep Weight Gain
Hypothesis: Organic agriculture leads to greater weight gain in sheep compared to conventional agriculture.
Setup: 40 animals divided into two groups of 20 on separate farms (one organic, one conventional).
Experimental Unit: Whole plant (treatment applied to the plant, leaves on the same plant are correlated).
Taking five leaves off each plant to get a better measurement of the variability of the nitrogen concentration, in each of the individual plants.
Example 3: Animal House, Fish Growth Rate, and Types of Food
Hypothesis: Fish food affects growth rate.
Setup: 100 fish divided into two tanks (50 each), one tank gets food A, and the other tank gets food B.
Sampling Unit: Individual fish (growth rate measurement).
Initial Experimental Unit: Individual fish (setup as described).
Potential Confounding: Tank environment (pH, temperature gradients) affecting growth rates.
Improvement: Use multiple tanks per treatment, randomly distributed to account for environmental variations.
Randomization
Ensures independence of error terms and avoids bias.
Without randomization, treatment effects cannot be disentangled from bias.
Example: Drug Company, New Pesticide, and Nematode Control
Scenario: Testing a new pesticide (nematode control) on farms.
Bias: List of farms is ordered by age (older farms at the top).
Non-Random Assignment: Dividing the list in half leads to treatment A being applied to older farms and treatment B to newer farms.
Consequence: Confounding between pesticide effect and farm age/nematode load.
Solution: Randomize the list of farms before assigning treatments.
Implementing Randomization
Humans are poor randomizers; use tools like coin flips, random number tables, or computer-generated random numbers.
R packages that generate a randomized experimental design and print it out. Then take the print out to the lab, and set up the experiment based on that.
Controlling Sources of Variation
Goal: Explain variation through treatments and control other potential sources of variation.
Unexplained variation ends up in the residual (error) term.
Controlling variation leads to stronger, more powerful experimental designs.
Completely Randomized Designs
Simplest design: Randomly allocate treatments among experimental units.
Sampling unit = experimental unit.
Limitations: Requires similar experimental units; otherwise, precision decreases, and conclusions may be incorrect.
If there is another source of variation, it will end up in the residual term -- error.
ANOVA Tables
They are capturing all this information.
Bigger Residuals and Smaller F Ratios
With the ANOVA table, you end up with bigger sums of squares in your residuals.
Mean squared residual is larger.
F ratios are smaller. Equation: F = \frac{MSTR}{MSE}
Weight of evidence gets reduced.
P values are larger.
Power of the experiment decreases.
Example: Paddock Experiment and Crop Yield
Setup: Randomly assign three treatments (A, B, C) across a field.
Unaccounted Variation: Moisture gradient due to a creek on one side of the field.
Consequence: Treatment A (by chance) is located near the creek, confounding the treatment effect with moisture content.
Blocking
Technique to control for known sources of variation.
Example: Moisture Gradient and Blocking
Concept: Divide the field into blocks, where each block is expected to have similar moisture content.
Randomized Complete Block Design: Within each block, randomly assign all three treatments (A, B, C).
Complete Block: Each of our three treatments are in each of the blocks.
Each block should have a similar moisture content.
ANOVA Model
Total Variation = Treatment Sums of Squares + Residual Sums of Squares (for completely randomized designs).
Total Variation = Treatment Sums of Squares + Blocking Term + Residual Sums of Squares (for blocking designs).
Blocking term explains variation between blocks (e.g., due to soil moisture).
Residual sums of squares represent variation within each block after accounting for treatment and blocking effects.
Fish blocking example
Scenario: In a warm and cool setting, put diet one and diet two next to each other and continue through the blocks. It's getting cooler as we walk away, but the two tanks here are under the same temperature regine.
Concept: We're controlling for that change in in temperature gradient.