Study Notes: Experimental Design, Random Assignment, and Sampling Techniques
Experimental Design and Random Assignment
Primary goal in experiments: determine if a treatment has an effect by comparing a treatment group to a baseline control group. The control group receives no treatment (or standard treatment) so that everything else is held constant across groups.
- If the treated group outperforms the control group, the treatment is considered effective.
Random assignment (randomization)
- Random assignment helps create groups that are roughly equivalent at the start of the experiment, reducing confounding (where other factors could explain observed differences).
- How it works (example): you have 100 participants and want two groups of 50. Use a random number generator to assign 50 to Group A and the remaining 50 to Group B.
- This process prevents self-selection into groups and helps ensure comparability.
Key example: long-hand note taking study (Oppenheimer, Mueller)
- Compare handwritten notes (vs alternative note-taking methods) to assess effect on learning or recall.
- Control for confounders: participants’ self-reported study hours can be biased; design ensures the main difference is the note-taking method, not other factors.
- Important principle: well-designed experiments avoid confounding by ensuring groups are treated the same except for the treatment.
Pfizer COVID-19 vaccine randomized trial (illustrative, real-world example)
- Population size: 43,548 volunteers.
- Randomization: half were randomly assigned to receive two vaccine doses 21 days apart; the other half were randomly assigned to receive two saline placebo shots (placebo) to mimic vaccine administration.
- Outcome: after several months, the vaccine was determined to be $95\%$ effective in preventing COVID-19.
- Purpose of random assignment in this trial: ensure that any differences in infection rates are due to the vaccine, not other factors (e.g., health status, exposure, or behavior).
- Interpretation: if the vaccine group shows substantially lower infection rates than the placebo group, the difference is attributable to the vaccine efficacy rather than other variables.
- How random assignment could be implemented in practice: assign each volunteer an ID (e.g., 1 to 43,548), use a random number generator to select IDs for the vaccine group, and assign the rest to placebo.
- Conceptual takeaway: randomization creates roughly equivalent groups at baseline, enabling causal inference about the treatment effect.
Observational studies vs experiments
- Observational studies often suffer from confounding because the treatment is not randomly assigned.
- Well-designed experiments mitigate confounding by ensuring that all aspects except the treatment are the same across groups.
- In observational settings, researchers may rely on self-reported data or natural variation, but causal claims are weaker due to potential confounders.
Random assignment: deeper understanding
- Random assignment is a statistical technique (often using random number generators) to allocate participants to groups.
- Goal: produce roughly equivalent groups at the outset to isolate the effect of the treatment.
- Simple illustrative example (numbers): with 100 individuals, randomly assign 50 to the treatment and 50 to the control using a RNG.
Sampling vs experimentation context (transition to sampling techniques)
- Population vs sample: the population is the entire group of interest; a sample is a subset used to make inferences about the population.
- Why sample? It’s often too costly or impossible to study everyone.
- Random sampling allows inference about the population if the sample is representative and large enough.
- Non-random sampling can introduce bias and limit the generalizability of results.
When random sampling is not possible or ethical
- Some topics involve vulnerability or harm, making random assignment or random sampling inappropriate (e.g., rape victims or incest survivors).
- In such cases, researchers rely on voluntary participation or convenience samples, which may bias results and limit population-level inferences.
Research ethics and seminars (context from the course)
- There is mention of a mandatory research ethics seminar as part of the course.
- Ethical considerations guide when randomization or certain sampling methods are appropriate or inappropriate.
Sampling strategies overview
- Non-random sampling methods (potential bias):
- Convenience sampling: select individuals who are easy to reach; may bias results if the sample is not representative.
- Voluntary response (self-selected samples): people choose to participate; can bias estimates toward the views of those who strongly feel a certain way.
- Systematic sampling: often treated as a probability method, but it can be non-random if the starting point is not random or if the list has hidden patterns.
- Random sampling methods (probability sampling):
- Simple random sampling: every individual in the population has an equal chance of being selected; every possible sample of a given size has equal probability.
- Stratified random sampling: divide the population into strata (subgroups) based on a characteristic (e.g., gender, age, race), then perform random sampling within each stratum.
- Proportional allocation: the sample from each stratum is proportional to its size in the population (e.g., if 50% male and 50% female in the population, aim for 50/50 in the sample).
- Purpose: ensure representation of key subgroups and improve accuracy.
- Systematic sampling: select a random start and then pick every kth element (k is the sampling interval, typically N/n).
- Example method: listing the population in a natural order (e.g., alphabetical), choose a random start between 1 and k, then select the 1st, (1+k), (1+2k), etc., to obtain n samples.
- Cluster sampling: groups (clusters) are sampled and then all members of selected clusters are sampled; useful when the population is spread out geographically.
Practical demonstrations and calculations
- Simple random sampling activity (conceptual):
- Draw a sample of size n from a population of size N using a random number generator to select IDs.
- Compute the sample mean: where $X_i$ are the observed values in the sample.
- Compare sample mean to population mean $\mu$; expect some variability due to sampling.
- Sampling variability
- Different random samples from the same population can yield different sample means.
- This variability explains why larger samples tend to yield estimates closer to the population parameter.
- Stratified sampling in practice
- If the population has known proportions by strata (e.g., gender, race), the sample should reflect those proportions to avoid bias.
- Example: if a population is 50% male and 50% female, stratified sampling with proportional allocation would aim for roughly equal representation in the sample.
Real-world applications and examples
- National surveys and statistics agencies (e.g., Stats Canada, Bureau of Labor Statistics) use stratified and other probability sampling methods to obtain representative samples.
- In healthcare, sampling plans for patient satisfaction, discharge studies, or other health metrics may use stratified or cluster designs to capture variation across hospitals or patient groups.
Key concepts to remember for exams
- Population vs sample; sampling frame; sampling bias.
- Random assignment (causal inference) vs random sampling (generalization).
- Confounding and how randomization mitigates it.
- Types of sampling methods and their biases: non-random (convenience, voluntary response) vs random (simple random, stratified, systematic, cluster).
- Sampling variability and the idea that the sample mean is an estimator of the population mean: \bar{X} \approx \mu) with variability across samples.
- Ethical considerations that constrain what sampling and assignment methods can be used in certain research contexts.
Quick glossary (from the lecture content)
- Confounding: when an outside factor influences both the treatment and the outcome, biasing the estimated treatment effect.
- Placebo: an inert treatment used to blind participants and control for expectations.
- Efficacy: the proportional reduction in disease incidence among the treated group relative to the control group, often expressed as \text{efficacy} = 1 - \text{RR} where RR is the relative risk.
- Sampling frame: the actual list or mechanism used to define the population from which a sample is drawn.
- Sampling interval: the fixed gap between selected elements in systematic sampling, k = N/n$$ for population size $N$ and sample size $n$.
- Sampling variability: the natural variation in a statistic (e.g., the sample mean) from one random sample to another.
- Proportional vs. stratified allocation: stratified sampling allocates samples to strata; proportional allocation mirrors population proportions.
Final notes for exam readiness
- Be able to describe, in your own words, why random assignment helps establish causality.
- Be able to differentiate between random sampling and random assignment and explain their respective purposes.
- Be able to outline a basic sampling plan (which method you would use and why) given a hypothetical research question.
- Be able to compute and interpret the sample mean and discuss how sampling variability affects estimates.
- Be prepared to discuss ethical considerations that might prevent randomization or random sampling in sensitive topics.