Experiment Units vs. Sampling Units: Understanding the distinction is crucial; continue practicing to master this concept.
Replication: Its importance and considerations for determining the number of replicates in an experiment.
Randomization: Essential for reducing biases during data collection.
Control of Natural Variation: Methods to manage environmental variability to avoid confounding results.
Experimental Designs: Introduction to completely randomized designs and blocking designs (randomized complete block designs).
Stratified sampling and blocking aim to control variation and maintain uniform experimental conditions.
Stratification is viewed as a sampling problem, addressing confounding through sampling design.
Blocking includes a term in the model to measure captured variation.
Involves dividing a field into sections (blocks) and randomly assigning treatment levels to each block.
This helps account for variation across the field, animal house, or glass house.
Example: An experiment with three treatments (A, B, C) set up across a field with a moisture gradient.
Total variation (sums of squares) is captured during the experiment.
This variation is explained by:
Treatment sums of squares.
Residual term (unexplained variation).
Blocking designs capture additional natural variation.
By accounting for other sources of variation, blocking increases the power of the experiment, separating signal from noise.
Comparing irrigation methods (Y, Z) on a field divided into 100 plots (10x10 array) using a completely randomized design.
25 plots are randomly assigned to each treatment level.
Data collected: Percentage moisture content in the soil.
Sampling unit: Each individual plot where soil moisture is measured.
Experimental unit: Equivalent to the sampling unit in a completely randomized design.
A one-way ANOVA is performed, and a significant result is obtained.
Residuals are checked for assumptions:
Equal variance: No fanning in the fitted values versus residual plot.
Normality: QQ plot indicates normality.
Plotting residuals as spatial coordinates across the field reveals trends.
In the given example, there's a moisture gradient (drier on one side).
Randomly assigning treatments without considering the moisture gradient can lead to confounding effects.
Randomly assigning plots in a field or animal house vs. using a randomized block design.
Divide the area into three blocks, pairing two treatments together.
ANOVA tables are modified to include the blocking term.
Degrees of freedom calculations are essential for verifying model correctness.
Treatments: Number of treatments minus one.
Blocking term: Number of blocks minus one.
Experimental units: When using a blocking design, the block becomes the experimental unit.
Replication: In the example, replication is three (three blocks).
Comparing species richness between burnt and unburnt habitats.
Fires don't burn randomly, making randomization challenging.
Set up 10 sites, each with a burnt and an unburnt area close together.
This minimizes spatial confounding and accounts for environmental gradients such as rainfall, soil, and vegetation changes.
Potential issue: A plot surrounded by burnt landscape may be an outlier.
General equation: y=overallmean+blockingeffects+treatmenteffect+error$$y = overall_mean + blocking_effects + treatment_effect + error$$ where y$$y$$ is the observed data.
In the species richness example: Species richness is the dependent variable, site is the blocking factor, and fire treatment (burnt/unburnt) is the treatment.
This model acknowledges site effects and identifies variation explained by the fire treatment.
ANOVA function in R: fire.aov <- aov(species_richness ~ site + fire_treatment, data = fire)
.
In R, blocking term should be the first factor entered in the model to assure correct ANOVA table calculation.
Unbalanced designs require the blocking term to go first.
If the design is unbalanced, the order of terms can change results.
Within a block, similarity is key to controlling variation.
Domain knowledge is essential, especially in fields like agriculture, animal science, or ecology.
A pilot study can help identify sources of variation and refine the blocking structure.
When moving away from completely randomized designs, sampling units and experimental units differ.
Identify these units to properly apply and interpret analyses.
Evaluate experimental designs in literature by identifying units and replicates.
Paired t-tests control variation when subjects or locations are not independent.
It's a special case of blocking with two treatment levels.
A paired t-test is equivalent to a one-sample t-test on the differences of the pairs.
Trialing two wheat varieties (A and B) across eight farms to measure yield.
Farm effect is removed by taking the differences between treatments on each farm.
Calculate the difference between the yields of variety A and variety B on each farm.
Perform a one-sample t-test on the differences to determine if there's a significant difference in yield between the two varieties.
Mean of the differences: ≈1.5$${\approx}1.5$$ kilos (variety A yields about 1.5 kilos more than B).
Standard deviation of the differences: Largely equivalent to the mean difference.
Sample size: n=8$$n = 8$$ farms.
Standard error of the mean: Calculated from sample size and standard deviation.
T-test statistic: 2.84.
Degrees of freedom: df=8−1=7$$df = 8 - 1 = 7$$.
P-value: Around 0.025.
Use the t.test
function in R to perform a paired t-test.
Code example: t.test(variety_A, variety_B, paired = TRUE)
.
The output includes the test statistic, degrees of freedom, and p-value.
Statistical conclusion: Determining significance based on the p-value.
Scientific conclusion: Providing more detail, such as the mean difference in yield.
Always present a measure of variation (standard errors or confidence intervals) when reporting means.
Ensure interpretations make sense for the system under study.
When more than two treatment levels are present, ANOVA with blocking terms is used.
Each block should contain all treatment levels, randomized at least once.
With four treatment levels, each block will have a minimum of four randomized plots.
Partition off total variation into grand mean, blocking, treatment, and unexplained variance.
It includes all terms of variance.
Blocks variable should be up front when coding.
Blocking, treatment, and residual are added up to get total sums of squares.
Degrees of freedom: Number of blocks minus one for blocking and number of treatments minus one for the treatments.
Ensure correct order for accurate F statistic calculation, because your F statistic will be calculated correctly if you do it in correct order.
The F statistic and p-value for the blocking term may not always be of interest, it just depends on your hypothesis and the question that you're trying to trying to answer.
The more plots for each treatment level within a block, variation is better estimated.
Experiment combining different fertilizer rates (five nitrogen levels) and three rice varieties.
Fifteen treatment combinations in total.
A blocking design is used because of potential gradients in the paddock.
The paddock is divided into four blocks, each containing all 15 treatment combinations, randomized.
Rice yield is measured.
Sampling unit: the plot of rice.
Experimental unit: the block.
Replication: four blocks are used.
One-way ANOVA with blocking is defined as:
Observeddata=overallmean+blockingeffect+treatmenteffect+error$$Observed_data=overall_mean+blocking_effect+treatment_effect+error$$
Where blocking effect should be the first term to be inserted into the equation for safety.
Effect of the ith$$i^{th}$$ block: mean in that block \minus $$ \minus $$ overall mean.
Treatment effect: treatment mean \minus $$ \minus $$ overall mean.
With a mean yield of 3 tons, variety one at 2.2 tons is underperforming by 0.8 tons, while another is above its weight by the 0.6 tons above mean.
Null hypothesis: No significant difference in means across treatment levels and effect size of zero.
Alternative hypothesis: Not all treatment means are equal and not all treatment effects are zero.
With 4 blocks there should be three degrees of freedom.
With 15 treatments, there should be 14.
Without using blocking, residual sums of squared plus captured effect, results with higher residual sums.
A reduction of one-third by imposing blocking structure presents a significant model improvement.
Effects also change because degrees of freedom shift around the model.
F statistic went from 20.7 (or whatever it was) to now 15.3, and our degrees of freedom have slightly changed as well, particularly going to 45.
What was our blocking sums of squares divided by the total sums of squares.
Blocking has lead to a variance capture of 5.3%$$5.3\%$$, and treatment leads to almost 83%$$83\%$$. This information can be found on the ANOVA table.
A lower term leads to a smaller confidence interval.
Post hoc test are still applicable as always.
If your F statistic you're calculating is about that or greater, then that's when you'll lead to a significant result.
Variance diminishing returns can result in degrees of freedom increasing in the residual term for those not using blocking.
Experimental Design Flashcards
fire.aov <- aov(species_richness ~ site + fire_treatment, data = fire)
.t.test
function in R to perform a paired t-test.t.test(variety_A, variety_B, paired = TRUE)
.One-way ANOVA with blocking is defined as:
Observeddata=overallmean+blockingeffect+treatmenteffect+error
Where blocking effect should be the first term to be inserted into the equation for safety.