Study Notes: Experimental Design, Random Assignment, and Sampling Techniques

Experimental Design and Random Assignment

  • Primary goal in experiments: determine if a treatment has an effect by comparing a treatment group to a baseline control group. The control group receives no treatment (or standard treatment) so that everything else is held constant across groups.

    • If the treated group outperforms the control group, the treatment is considered effective.
  • Random assignment (randomization)

    • Random assignment helps create groups that are roughly equivalent at the start of the experiment, reducing confounding (where other factors could explain observed differences).
    • How it works (example): you have 100 participants and want two groups of 50. Use a random number generator to assign 50 to Group A and the remaining 50 to Group B.
    • This process prevents self-selection into groups and helps ensure comparability.
  • Key example: long-hand note taking study (Oppenheimer, Mueller)

    • Compare handwritten notes (vs alternative note-taking methods) to assess effect on learning or recall.
    • Control for confounders: participants’ self-reported study hours can be biased; design ensures the main difference is the note-taking method, not other factors.
    • Important principle: well-designed experiments avoid confounding by ensuring groups are treated the same except for the treatment.
  • Pfizer COVID-19 vaccine randomized trial (illustrative, real-world example)

    • Population size: 43,548 volunteers.
    • Randomization: half were randomly assigned to receive two vaccine doses 21 days apart; the other half were randomly assigned to receive two saline placebo shots (placebo) to mimic vaccine administration.
    • Outcome: after several months, the vaccine was determined to be $95\%$ effective in preventing COVID-19.
    • Purpose of random assignment in this trial: ensure that any differences in infection rates are due to the vaccine, not other factors (e.g., health status, exposure, or behavior).
    • Interpretation: if the vaccine group shows substantially lower infection rates than the placebo group, the difference is attributable to the vaccine efficacy rather than other variables.
    • How random assignment could be implemented in practice: assign each volunteer an ID (e.g., 1 to 43,548), use a random number generator to select IDs for the vaccine group, and assign the rest to placebo.
    • Conceptual takeaway: randomization creates roughly equivalent groups at baseline, enabling causal inference about the treatment effect.
  • Observational studies vs experiments

    • Observational studies often suffer from confounding because the treatment is not randomly assigned.
    • Well-designed experiments mitigate confounding by ensuring that all aspects except the treatment are the same across groups.
    • In observational settings, researchers may rely on self-reported data or natural variation, but causal claims are weaker due to potential confounders.
  • Random assignment: deeper understanding

    • Random assignment is a statistical technique (often using random number generators) to allocate participants to groups.
    • Goal: produce roughly equivalent groups at the outset to isolate the effect of the treatment.
    • Simple illustrative example (numbers): with 100 individuals, randomly assign 50 to the treatment and 50 to the control using a RNG.
  • Sampling vs experimentation context (transition to sampling techniques)

    • Population vs sample: the population is the entire group of interest; a sample is a subset used to make inferences about the population.
    • Why sample? It’s often too costly or impossible to study everyone.
    • Random sampling allows inference about the population if the sample is representative and large enough.
    • Non-random sampling can introduce bias and limit the generalizability of results.
  • When random sampling is not possible or ethical

    • Some topics involve vulnerability or harm, making random assignment or random sampling inappropriate (e.g., rape victims or incest survivors).
    • In such cases, researchers rely on voluntary participation or convenience samples, which may bias results and limit population-level inferences.
  • Research ethics and seminars (context from the course)

    • There is mention of a mandatory research ethics seminar as part of the course.
    • Ethical considerations guide when randomization or certain sampling methods are appropriate or inappropriate.
  • Sampling strategies overview

    • Non-random sampling methods (potential bias):
    • Convenience sampling: select individuals who are easy to reach; may bias results if the sample is not representative.
    • Voluntary response (self-selected samples): people choose to participate; can bias estimates toward the views of those who strongly feel a certain way.
    • Systematic sampling: often treated as a probability method, but it can be non-random if the starting point is not random or if the list has hidden patterns.
    • Random sampling methods (probability sampling):
    • Simple random sampling: every individual in the population has an equal chance of being selected; every possible sample of a given size has equal probability.
    • Stratified random sampling: divide the population into strata (subgroups) based on a characteristic (e.g., gender, age, race), then perform random sampling within each stratum.
      • Proportional allocation: the sample from each stratum is proportional to its size in the population (e.g., if 50% male and 50% female in the population, aim for 50/50 in the sample).
      • Purpose: ensure representation of key subgroups and improve accuracy.
    • Systematic sampling: select a random start and then pick every kth element (k is the sampling interval, typically N/n).
      • Example method: listing the population in a natural order (e.g., alphabetical), choose a random start between 1 and k, then select the 1st, (1+k), (1+2k), etc., to obtain n samples.
    • Cluster sampling: groups (clusters) are sampled and then all members of selected clusters are sampled; useful when the population is spread out geographically.
  • Practical demonstrations and calculations

    • Simple random sampling activity (conceptual):
    • Draw a sample of size n from a population of size N using a random number generator to select IDs.
    • Compute the sample mean: Xˉ=1n<em>i=1nX</em>i\bar{X} = \frac{1}{n} \sum<em>{i=1}^n X</em>i where $X_i$ are the observed values in the sample.
    • Compare sample mean to population mean $\mu$; expect some variability due to sampling.
    • Sampling variability
    • Different random samples from the same population can yield different sample means.
    • This variability explains why larger samples tend to yield estimates closer to the population parameter.
    • Stratified sampling in practice
    • If the population has known proportions by strata (e.g., gender, race), the sample should reflect those proportions to avoid bias.
    • Example: if a population is 50% male and 50% female, stratified sampling with proportional allocation would aim for roughly equal representation in the sample.
  • Real-world applications and examples

    • National surveys and statistics agencies (e.g., Stats Canada, Bureau of Labor Statistics) use stratified and other probability sampling methods to obtain representative samples.
    • In healthcare, sampling plans for patient satisfaction, discharge studies, or other health metrics may use stratified or cluster designs to capture variation across hospitals or patient groups.
  • Key concepts to remember for exams

    • Population vs sample; sampling frame; sampling bias.
    • Random assignment (causal inference) vs random sampling (generalization).
    • Confounding and how randomization mitigates it.
    • Types of sampling methods and their biases: non-random (convenience, voluntary response) vs random (simple random, stratified, systematic, cluster).
    • Sampling variability and the idea that the sample mean is an estimator of the population mean: \bar{X} \approx \mu) with variability across samples.
    • Ethical considerations that constrain what sampling and assignment methods can be used in certain research contexts.
  • Quick glossary (from the lecture content)

    • Confounding: when an outside factor influences both the treatment and the outcome, biasing the estimated treatment effect.
    • Placebo: an inert treatment used to blind participants and control for expectations.
    • Efficacy: the proportional reduction in disease incidence among the treated group relative to the control group, often expressed as \text{efficacy} = 1 - \text{RR} where RR is the relative risk.
    • Sampling frame: the actual list or mechanism used to define the population from which a sample is drawn.
    • Sampling interval: the fixed gap between selected elements in systematic sampling, k = N/n$$ for population size $N$ and sample size $n$.
    • Sampling variability: the natural variation in a statistic (e.g., the sample mean) from one random sample to another.
    • Proportional vs. stratified allocation: stratified sampling allocates samples to strata; proportional allocation mirrors population proportions.
  • Final notes for exam readiness

    • Be able to describe, in your own words, why random assignment helps establish causality.
    • Be able to differentiate between random sampling and random assignment and explain their respective purposes.
    • Be able to outline a basic sampling plan (which method you would use and why) given a hypothetical research question.
    • Be able to compute and interpret the sample mean and discuss how sampling variability affects estimates.
    • Be prepared to discuss ethical considerations that might prevent randomization or random sampling in sensitive topics.