Sampling, Quantitative vs Qualitative Data, and the Foundations of Random Sampling
Quantitative vs. Qualitative Measurement
Median survival illustration
• Two arms of a cancer-vaccine trial: one arm had median survival \approx 10\text{ months}, the other \approx 6\text{ months}.
• Quantitative view (actual months lived) revealed a near-significant difference; qualitative view ("responded yes/no") appeared flat.
• Core moral: quantitative variables carry more “information content” than binary/qualitative variables, enlarging power to detect effects.Engineer’s aphorism: “If you can measure numerically, do it.”
• Binary outcomes are unavoidable in some contexts (e.g., “had surgery: yes/no”), but whenever a continuum exists, prefer it.Covariates
• Length of treatment, disease stage, etc., mentioned as ancillary variables that must be “taken into account” in analysis.
• Term introduced: covariate = a variable not itself the primary outcome but possibly associated with outcome or exposure.
Populations, Samples, Measurements (Quick Recap)
Identify population ⇒ decide variables ⇒ design study ⇒ collect data.
Data collection begins with sampling.
Why Sampling Quality Matters
Sample = small fraction of population; if collected poorly, inferences are by definition poor.
Ultimate mission: obtain a sample representative of the population.
• Trivial statement, yet operationally non-trivial: “how” is the hard part.
Random (Probability) vs. Non-Random (Non-Probability) Sampling
Two overarching families:
Random / Probability sampling.
• Guiding principle: randomness = unpredictability ⇒ immunizes against manipulation.
• Enables computation of probabilities for sample statistics.Non-Random / Non-Probability sampling.
• At least one population element has 0 chance of selection.
• Examples: web-site pop-up polls, “clipboard on Bruin Walk” intercept surveys.
Size does not rescue bias
• Millions of web responses still = huge but bad sample if selection is biased.
Clinical-Trial Reality Check
Clinical studies are inherently non-random samples:
• Geography limits who can attend a study site.
• Informed consent allows refusal.
• Investigator discretion & site capacity limit who is even asked.Consequence: must assume sample is “as if” random (representative) to apply statistical inference.
• If that assumption fails, results lose generalizability.
Types of Random Samples (overview)
Simple Random Sample (SRS) ⟶ detailed below.
Stratified random sample.
Cluster (multistage) sample.
Systematic sample.
(Only SRS introduced so far.)
Simple Random Sample (SRS)
Definition: every element has equal chance of selection.
• SRS ⊂ random sampling (where only “non-zero chance” is required).Classic metaphor: lottery balls.
• 65 numbered balls bounce with forced air; each draw has 1/65 chance on first selection.
Sampling With vs. Without Replacement
Lottery uses without replacement (ball is set aside).
• Probabilities shift: P(\text{draw}=i \text{ on 2nd}) = 1/64, etc.For large populations with small samples, difference between with/without replacement is negligible ⇒ treat as with replacement to simplify probability math.
Practical Walk-Through: UCLA Diet Survey
Goal: SRS of n = 100 undergraduates from population N \approx 45{,}000.
Step-by-step:
Create sampling frame: obtain registrar list of all enrolled students.
Assign unique equal-length IDs: UID already nine digits; if homemade, pad with leading zeros: 00001,\ldots,45000.
Random-number generation: use an RNG (Excel RAND, calculator RND key, Python
random, etc.).
• Algorithm picks digits 0–9 with equal probability to form 9-digit strings.
• If generated ID not in frame, discard and redraw (still random).Repeat until 100 valid IDs gathered.
Contact those students for the diet questionnaire.
Notation reminder:
• n = sample size (here 100).
• N = population size (here \approx 45{,}000).
RNGs & Philosophical Aside
Computers are deterministic; “random” numbers are produced by algorithms (pseudo-random).
Lecturer foreshadows deeper discussion on adequacy of pseudo-randomness for statistical work.
Probability Example (cards)
If deck truly random and you don’t manipulate:
• P(\text{5 red cards}) = \dfrac{\binom{26}{5}}{\binom{52}{5}}.If you secretly arrange deck (non-random), probability becomes 1 (forced) or 0; can’t compute meaningful probability.
Key Takeaways & Best Practices
Quantitative outcomes > qualitative for detecting effects.
Representativeness > sample size; bias cannot be “averaged out.”
Clinical research relies on optimistic assumption of representativeness; critical readers must scrutinize this claim.
Understand varieties of random sampling; SRS is simplest but not only method.
Random number generators enable practical SRS; awareness of pseudo-random versus true random helps assess rigor.
Formulas & Notation Recap
Median definition (sample): the ordered data value at position \dfrac{n+1}{2} (if n odd) or average of middle pair if n even.
Lottery selection probabilities (without replacement):
P1 = \dfrac{1}{65},\;P2 = \dfrac{1}{64}, \ldotsCard example (5 red):
P=\dfrac{\binom{26}{5}}{\binom{52}{5}}\approx 0.000495.