1.3 video Experimental Design: CRD, RBD, and Matched Pairs — Concepts, Steps, and Examples

Characteristics of an Experiment

  • An experiment is a controlled study conducted to determine the effect of varying one or more explanatory variable (also called factors) on a response variable (outcome).
  • Explanatory variable / factor: the variable(s) whose effect we want to study. Denoted as the factors that can be set at specific levels.
  • Response variable / outcome: what we measure in the experiment to assess the effect of the factors.
  • Treatment: a specific combination of levels of the explanatory variables (a particular set of factor values).
  • Experimental unit: the person, object, or other item to which a treatment is applied. Examples: a person, a car, a metal specimen, etc.
  • Control group: a baseline treatment used for comparison with other treatments. Often includes a placebo in medical studies.
  • Placebo: an inert substance with no therapeutic effect that resembles the actual treatment; used to blind participants and reduce bias. The placebo should look/smell/taste like the active treatment so participants cannot easily tell which they receive.
  • Blinding (nondisclosure of treatment): reduces bias by preventing subjects or researchers from knowing which treatment is being administered.
    • Single blind: experimental units (participants) do not know which treatment they receive.
    • Double blind: both the experimental units and the researchers/experimenters do not know which treatment is being administered to whom.
  • Experimental design helps control for confounding variables by using randomization and/or blocking to balance unknown influences across treatments.
  • Example framework: testing whether a new drug has a different effect compared with placebo while controlling for other factors.

Key concepts from the English department online vs traditional course example

  • Experimental units: students in the English department course sections.
  • Population: all students in the English department at the community college.
  • Treatments: traditional course vs online course (two treatments).
  • Response variable: student performance (e.g., final grades or test scores).
  • Blinding in this context: not feasible to blind students to online vs traditional format because the delivery mode is obvious; thus, this study cannot be blinded.

Step-by-step process in designing an experiment

1) Identify the problem to be solved

  • Explicitly state the problem and the experimental claim (hypothesis).
  • Identify the response variable to be measured and the population to be studied.

2) Determine factors that affect the response

  • List potential factors that could influence the response variable (e.g., temperature, humidity, driving style).
  • Classify factors as controllable (can be fixed at a level) or uncontrollable.
  • Decide when a factor can be fixed versus when it must vary (control vs natural conditions).

3) Determine the number of experimental units

  • Decide how many experimental units are needed in total and how to allocate them among treatments.
  • General rule of thumb: use as many units as time and money allow.
  • Note: methods exist to estimate required sample size given certain information.

4) Determine the level(s) of explanatory variables

  • For each factor, decide the number of levels and the values to use.
  • Two options:
    • Hold a factor at a single level throughout the experiment.
    • Vary the factor across multiple levels to study its effect (e.g., different temperatures).

5) Randomize

  • Randomize experimental units to treatments to minimize the effect of uncontrolled variables.
  • Randomization helps ensure that unknown or uncontrollable factors are balanced across treatment groups.

6) Replication

  • Replicate the experiment by repeating it on multiple experimental units (or multiple times on the same unit, when appropriate).
  • Replication helps estimate experimental error and reveals whether observed effects are consistent.

7) Collect and analyze data; test the claim

  • Measure the response variable for each replication.

  • Differences in the response are attributed to differences in treatment, after accounting for random variation.

  • Use inferential statistics to test the claim and draw conclusions about the population.

  • A substantial portion of this analysis is covered in later chapters (e.g., Chapter 9/10).

  • Note on confounding: randomization helps to average out effects from variables that cannot be controlled or measured; think about potential confounders and how blocking or randomization mitigates them.


Types of experimental designs

  • There are three basic designs:
    • Completely Randomized Design (CRD): experimental units are randomly assigned to treatments.
    • Randomized Block Design (RBD): experimental units are grouped into homogeneous blocks, and randomization occurs within each block.
    • Matched Pairs Design (MPD): experimental units are paired (matched) based on similarity or pre-conditions, and each pair receives different treatments.

Completely Randomized Design (CRD)

  • Definition: Random assignment of experimental units to treatments with no blocking.
  • Example: An engineer tests whether fuel octane level affects miles per gallon (MPG).
    • Response variable: MPG.
    • Factors (explanatory variables): octane level (three levels: 87, 89, 92), engine size (fixed at one level), outside temperature (uncontrolled but assumed similar across units), driving style/conditions (controlled to be the same).
    • Experimental units: 12 cars of the same model/year.
    • Levels and treatments: octane levels as three treatments: A = 87, B = 89, C = 92.
    • Randomization: randomly assign the 12 cars to the three octane treatments (e.g., 4 cars per treatment).
    • Replication: each car experiences a single octane level; however, the same model/year cars are used to provide replication across units.
    • Handling potential confounders: randomization is used to balance effects from uncontrolled variables (e.g., minor differences between cars).
  • Visual description: 12 experimental units (cars) -> randomized allocation to three treatment groups (4 cars per group) -> measure MPG for each car.
  • Important consideration: Is there potential confounding from a variable not accounted for? Pause and consider possible confounders.

Randomized Block Design (RBD)

  • Definition: Group similar experimental units into blocks (homogeneous groups) and randomly assign treatments within each block.
  • Blocks reduce variability due to the blocking factor and improve precision.
  • Example: English department study with potential gender differences in performance under two course formats (online vs traditional).
    • Blocking factor: gender (men vs women).
    • Blocks: two blocks – a block of men and a block of women.
    • Within each block, randomly assign 30 students to online vs traditional courses (47? 30 each for men; 35 to online and 35 to traditional for women in the example).
    • Compare performance (test scores) within each block, not across blocks.
  • Key concept: Blocking creates homogeneous groups to control for the blocking variable's effect on the response.
  • Comparison rule: treatment effects are assessed within blocks, and block effects are accounted for in the analysis.

Matched Pairs Design (MPD)

  • Definition: Experimental units are paired or matched so that each pair is similar on relevant characteristics.
  • Two levels of treatment in matched pairs (e.g., before/after, left/right, twins, or same-location comparisons).
  • Examples:
    • Before-and-after: measure a variable (e.g., glucose level) in the same person before and after a treatment.
    • Left arm vs right arm: measure arm length on each arm of the same person.
    • Twins: measure a trait in twin siblings and compare under different conditions.
  • In MPD, there are typically two treatments and each pair provides one difference score: Di = Y{i1} - Y_{i2}.
  • Analysis focuses on the pairwise differences to estimate the treatment effect τ, often using the average difference:

    • au \,\approx\, \bar{D} = \frac{1}{n}\sum{i=1}^{n} Di
    • The variance of D_i is used to assess significance.
  • Practical example from the transcript (Xylitol study): 75 Peruvian children receive milk with Xylitol and milk without Xylitol with randomized order to mitigate order effects. This is a cross-over design, which is a form of matched-pairs/activity where each child serves as their own control.
    • Treatments: milk with Xylitol vs milk without Xylitol (two levels).
    • Experimental units: the 75 children.
    • Randomization: the order of receiving the two milk types is randomized to avoid order effects.
    • Blinding: suggested as double-blind to avoid bias from both participants and researchers.
    • Important note: cross-over designs assume no carryover effects from the first period to the second; washout periods are often used in practice to mitigate this.

Notation and simple formulas for the designs

  • Completely Randomized Design (CRD)

    • Model:
      Y{ij} = \mu + \taui + \varepsilon_{ij}
    • i indexes treatments (i = 1, 2, …, t); j indexes experimental units within treatment.
    • Components:
    • \mu: overall mean
    • \tau_i: effect of the i-th treatment
    • \varepsilon_{ij}: random error (assumed ~ N(0, \sigma^2))
  • Randomized Block Design (RBD)

    • Model:
      Y{ij} = \mu + \taui + \betaj + \varepsilon{ij}
    • i indexes treatments; j indexes blocks.
    • \beta_j: effect of the j-th block (block effect)
    • Other terms as in CRD
  • Matched Pairs / Cross-Over (two-period) design (two treatments A and B within pairs)

    • One convenient way to express: for pair i, two observations Y{i1} and Y{i2} corresponding to treatments A and B.
    • Pairwise difference:
      Di = Y{i1} - Y_{i2}
    • Estimated treatment effect:
      \hat{\tau} = \bar{D} = \frac{1}{n}\sum{i=1}^{n} Di
    • Alternative model for a two-period cross-over:
      Y{ik} = \mu + \tauk + \pii + \varepsilon{ik}
    • where k ∈ {A,B}, \pi_i is the effect for subject i, capturing the paired nature.

Quick reference: terminology recap

  • Population vs Experimental Unit
    • Population: the larger group to which we want to generalize.
    • Experimental unit: the smallest unit to which a treatment is applied (e.g., a single car, a person, a child).
  • Treatment vs Level
    • Treatment: combination of factor values; e.g., (octane level = 87, engine size fixed, etc.).
    • Level: a specific value that a factor can take (e.g., octane level = 87).
  • Control and Placebo
    • Control: baseline condition for comparison.
    • Placebo: inert treatment designed to resemble the active treatment.
  • Blinding
    • Single blind: participants don’t know which treatment they receive.
    • Double blind: neither participants nor researchers know which treatment is given to which unit.

Real-world implications and design considerations

  • Ethical and practical considerations with blinding and placebos
    • Blinding reduces bias in outcomes and assessments.
    • Use of placebos should be justified and conducted with informed consent; double-blind designs should be implemented to minimize bias.
  • Choice of design depends on context
    • If units are highly similar and randomization is feasible, CRD is simple and effective.
    • If there is a known source of variability (e.g., gender, age, baseline measurements), blocking (RBD) helps reduce this variability and improve precision.
    • If each unit can be observed under multiple treatments (within-subjects), MPD or cross-over designs leverage that by using paired differences to control for unit-specific variation.
  • Connections to foundational principles
    • Randomization is essential to produce unbiased and generalizable conclusions.
    • Blocking and pairing are strategies to control known sources of variability and increase the power of the study.
    • Replication and sample size are critical for reliable inference and estimation of treatment effects.

Applications and takeaways

  • When planning an experiment, start by clearly identifying the problem, the response variable, and the population.
  • Determine the factors and their levels, decide on the experimental unit, and plan randomization and replication.
  • Choose a design (CRD, RBD, or MPD) based on the context and the presence of potential confounders.
  • Use the appropriate model to analyze data and test the claim, recognizing how the design influences the interpretation of treatment effects.

Summary of the three designs (quick cheatsheet)

  • Completely Randomized Design (CRD)
    • Random assignment of units to treatments; no blocks.
    • Model: Y{ij} = \mu + \taui + \varepsilon_{ij}
  • Randomized Block Design (RBD)
    • Units grouped into blocks; randomization within each block.
    • Model: Y{ij} = \mu + \taui + \betaj + \varepsilon{ij}
  • Matched Pairs / Cross-Over (MPD)
    • Pairs of units matched; each pair receives different treatments; analyze using within-pair differences.
    • For two-period cross-over: Y{ik} = \mu + \tauk + \pii + \varepsilon{ik}