1.3 video Experimental Design: CRD, RBD, and Matched Pairs — Concepts, Steps, and Examples
Characteristics of an Experiment
- An experiment is a controlled study conducted to determine the effect of varying one or more explanatory variable (also called factors) on a response variable (outcome).
- Explanatory variable / factor: the variable(s) whose effect we want to study. Denoted as the factors that can be set at specific levels.
- Response variable / outcome: what we measure in the experiment to assess the effect of the factors.
- Treatment: a specific combination of levels of the explanatory variables (a particular set of factor values).
- Experimental unit: the person, object, or other item to which a treatment is applied. Examples: a person, a car, a metal specimen, etc.
- Control group: a baseline treatment used for comparison with other treatments. Often includes a placebo in medical studies.
- Placebo: an inert substance with no therapeutic effect that resembles the actual treatment; used to blind participants and reduce bias. The placebo should look/smell/taste like the active treatment so participants cannot easily tell which they receive.
- Blinding (nondisclosure of treatment): reduces bias by preventing subjects or researchers from knowing which treatment is being administered.
- Single blind: experimental units (participants) do not know which treatment they receive.
- Double blind: both the experimental units and the researchers/experimenters do not know which treatment is being administered to whom.
- Experimental design helps control for confounding variables by using randomization and/or blocking to balance unknown influences across treatments.
- Example framework: testing whether a new drug has a different effect compared with placebo while controlling for other factors.
Key concepts from the English department online vs traditional course example
- Experimental units: students in the English department course sections.
- Population: all students in the English department at the community college.
- Treatments: traditional course vs online course (two treatments).
- Response variable: student performance (e.g., final grades or test scores).
- Blinding in this context: not feasible to blind students to online vs traditional format because the delivery mode is obvious; thus, this study cannot be blinded.
Step-by-step process in designing an experiment
1) Identify the problem to be solved
- Explicitly state the problem and the experimental claim (hypothesis).
- Identify the response variable to be measured and the population to be studied.
2) Determine factors that affect the response
- List potential factors that could influence the response variable (e.g., temperature, humidity, driving style).
- Classify factors as controllable (can be fixed at a level) or uncontrollable.
- Decide when a factor can be fixed versus when it must vary (control vs natural conditions).
3) Determine the number of experimental units
- Decide how many experimental units are needed in total and how to allocate them among treatments.
- General rule of thumb: use as many units as time and money allow.
- Note: methods exist to estimate required sample size given certain information.
4) Determine the level(s) of explanatory variables
- For each factor, decide the number of levels and the values to use.
- Two options:
- Hold a factor at a single level throughout the experiment.
- Vary the factor across multiple levels to study its effect (e.g., different temperatures).
5) Randomize
- Randomize experimental units to treatments to minimize the effect of uncontrolled variables.
- Randomization helps ensure that unknown or uncontrollable factors are balanced across treatment groups.
6) Replication
- Replicate the experiment by repeating it on multiple experimental units (or multiple times on the same unit, when appropriate).
- Replication helps estimate experimental error and reveals whether observed effects are consistent.
7) Collect and analyze data; test the claim
Measure the response variable for each replication.
Differences in the response are attributed to differences in treatment, after accounting for random variation.
Use inferential statistics to test the claim and draw conclusions about the population.
A substantial portion of this analysis is covered in later chapters (e.g., Chapter 9/10).
Note on confounding: randomization helps to average out effects from variables that cannot be controlled or measured; think about potential confounders and how blocking or randomization mitigates them.
Types of experimental designs
- There are three basic designs:
- Completely Randomized Design (CRD): experimental units are randomly assigned to treatments.
- Randomized Block Design (RBD): experimental units are grouped into homogeneous blocks, and randomization occurs within each block.
- Matched Pairs Design (MPD): experimental units are paired (matched) based on similarity or pre-conditions, and each pair receives different treatments.
Completely Randomized Design (CRD)
- Definition: Random assignment of experimental units to treatments with no blocking.
- Example: An engineer tests whether fuel octane level affects miles per gallon (MPG).
- Response variable: MPG.
- Factors (explanatory variables): octane level (three levels: 87, 89, 92), engine size (fixed at one level), outside temperature (uncontrolled but assumed similar across units), driving style/conditions (controlled to be the same).
- Experimental units: 12 cars of the same model/year.
- Levels and treatments: octane levels as three treatments: A = 87, B = 89, C = 92.
- Randomization: randomly assign the 12 cars to the three octane treatments (e.g., 4 cars per treatment).
- Replication: each car experiences a single octane level; however, the same model/year cars are used to provide replication across units.
- Handling potential confounders: randomization is used to balance effects from uncontrolled variables (e.g., minor differences between cars).
- Visual description: 12 experimental units (cars) -> randomized allocation to three treatment groups (4 cars per group) -> measure MPG for each car.
- Important consideration: Is there potential confounding from a variable not accounted for? Pause and consider possible confounders.
Randomized Block Design (RBD)
- Definition: Group similar experimental units into blocks (homogeneous groups) and randomly assign treatments within each block.
- Blocks reduce variability due to the blocking factor and improve precision.
- Example: English department study with potential gender differences in performance under two course formats (online vs traditional).
- Blocking factor: gender (men vs women).
- Blocks: two blocks – a block of men and a block of women.
- Within each block, randomly assign 30 students to online vs traditional courses (47? 30 each for men; 35 to online and 35 to traditional for women in the example).
- Compare performance (test scores) within each block, not across blocks.
- Key concept: Blocking creates homogeneous groups to control for the blocking variable's effect on the response.
- Comparison rule: treatment effects are assessed within blocks, and block effects are accounted for in the analysis.
Matched Pairs Design (MPD)
- Definition: Experimental units are paired or matched so that each pair is similar on relevant characteristics.
- Two levels of treatment in matched pairs (e.g., before/after, left/right, twins, or same-location comparisons).
- Examples:
- Before-and-after: measure a variable (e.g., glucose level) in the same person before and after a treatment.
- Left arm vs right arm: measure arm length on each arm of the same person.
- Twins: measure a trait in twin siblings and compare under different conditions.
- In MPD, there are typically two treatments and each pair provides one difference score: Di = Y{i1} - Y_{i2}.
- Analysis focuses on the pairwise differences to estimate the treatment effect τ, often using the average difference:
au \,\approx\, \bar{D} = \frac{1}{n}\sum{i=1}^{n} Di- The variance of D_i is used to assess significance.
- Practical example from the transcript (Xylitol study): 75 Peruvian children receive milk with Xylitol and milk without Xylitol with randomized order to mitigate order effects. This is a cross-over design, which is a form of matched-pairs/activity where each child serves as their own control.
- Treatments: milk with Xylitol vs milk without Xylitol (two levels).
- Experimental units: the 75 children.
- Randomization: the order of receiving the two milk types is randomized to avoid order effects.
- Blinding: suggested as double-blind to avoid bias from both participants and researchers.
- Important note: cross-over designs assume no carryover effects from the first period to the second; washout periods are often used in practice to mitigate this.
Notation and simple formulas for the designs
Completely Randomized Design (CRD)
- Model:
Y{ij} = \mu + \taui + \varepsilon_{ij} - i indexes treatments (i = 1, 2, …, t); j indexes experimental units within treatment.
- Components:
- \mu: overall mean
- \tau_i: effect of the i-th treatment
- \varepsilon_{ij}: random error (assumed ~ N(0, \sigma^2))
- Model:
Randomized Block Design (RBD)
- Model:
Y{ij} = \mu + \taui + \betaj + \varepsilon{ij} - i indexes treatments; j indexes blocks.
- \beta_j: effect of the j-th block (block effect)
- Other terms as in CRD
- Model:
Matched Pairs / Cross-Over (two-period) design (two treatments A and B within pairs)
- One convenient way to express: for pair i, two observations Y{i1} and Y{i2} corresponding to treatments A and B.
- Pairwise difference:
Di = Y{i1} - Y_{i2} - Estimated treatment effect:
\hat{\tau} = \bar{D} = \frac{1}{n}\sum{i=1}^{n} Di - Alternative model for a two-period cross-over:
Y{ik} = \mu + \tauk + \pii + \varepsilon{ik} - where k ∈ {A,B}, \pi_i is the effect for subject i, capturing the paired nature.
Quick reference: terminology recap
- Population vs Experimental Unit
- Population: the larger group to which we want to generalize.
- Experimental unit: the smallest unit to which a treatment is applied (e.g., a single car, a person, a child).
- Treatment vs Level
- Treatment: combination of factor values; e.g., (octane level = 87, engine size fixed, etc.).
- Level: a specific value that a factor can take (e.g., octane level = 87).
- Control and Placebo
- Control: baseline condition for comparison.
- Placebo: inert treatment designed to resemble the active treatment.
- Blinding
- Single blind: participants don’t know which treatment they receive.
- Double blind: neither participants nor researchers know which treatment is given to which unit.
Real-world implications and design considerations
- Ethical and practical considerations with blinding and placebos
- Blinding reduces bias in outcomes and assessments.
- Use of placebos should be justified and conducted with informed consent; double-blind designs should be implemented to minimize bias.
- Choice of design depends on context
- If units are highly similar and randomization is feasible, CRD is simple and effective.
- If there is a known source of variability (e.g., gender, age, baseline measurements), blocking (RBD) helps reduce this variability and improve precision.
- If each unit can be observed under multiple treatments (within-subjects), MPD or cross-over designs leverage that by using paired differences to control for unit-specific variation.
- Connections to foundational principles
- Randomization is essential to produce unbiased and generalizable conclusions.
- Blocking and pairing are strategies to control known sources of variability and increase the power of the study.
- Replication and sample size are critical for reliable inference and estimation of treatment effects.
Applications and takeaways
- When planning an experiment, start by clearly identifying the problem, the response variable, and the population.
- Determine the factors and their levels, decide on the experimental unit, and plan randomization and replication.
- Choose a design (CRD, RBD, or MPD) based on the context and the presence of potential confounders.
- Use the appropriate model to analyze data and test the claim, recognizing how the design influences the interpretation of treatment effects.
Summary of the three designs (quick cheatsheet)
- Completely Randomized Design (CRD)
- Random assignment of units to treatments; no blocks.
- Model: Y{ij} = \mu + \taui + \varepsilon_{ij}
- Randomized Block Design (RBD)
- Units grouped into blocks; randomization within each block.
- Model: Y{ij} = \mu + \taui + \betaj + \varepsilon{ij}
- Matched Pairs / Cross-Over (MPD)
- Pairs of units matched; each pair receives different treatments; analyze using within-pair differences.
- For two-period cross-over: Y{ik} = \mu + \tauk + \pii + \varepsilon{ik}