Stats 2.1 Collecting Sample Data

Section 1-3: Collecting Sample Data

  • Objectives

    • Define and identify a simple random sample.

    • Understand the importance of sound sampling methods and the importance of good design of experiments.

  • Key Concept

    • The method used to collect sample data influences the quality of the statistical analysis.

    • The simple random sample is the standard or “gold standard” for sampling.

    • In experiments, the so-called “gold standard” is randomization with placebo/treatment groups; a placebo (e.g., a sugar pill) has no medicinal effect.

  • Basics of Collecting Data

    • Data sources: observational studies and experiments.

    • Experiment: apply a treatment and observe its effects on subjects (experimental units).

    • Example: An experiment where students are asked to pick their favorite soda among A, B, C, D, E.

    • Observational study: observe and measure characteristics without attempting to modify subjects (e.g., a poll asking preferences).

    • Example: Ice Cream and Drownings – Observational study can show a correlation between ice cream sales and drownings, but a lurking variable (temperature) confounds the relationship.

  • Ice Cream and Drownings: Observational vs Experimental

    • Observational study claim: as ice cream sales increase, drownings increase.

    • Lurking variable: temperature – higher temperature increases both ice cream sales and swimming/drowning risk.

    • Correct conclusion requires controlling for temperature (or using an experimental design).

    • Experimental approach (concept): compare two groups – one with ice cream, one without – and observe drowning rates.

    • Result in the example: with ice cream vs no ice cream, drowning rates are similar, suggesting no effect of ice cream on drownings. This illustrates why experiments can reveal causality more reliably than observational studies when confounders exist.

  • Design of Experiments

    • Replication

    • Replication = repetition of an experiment on more than one individual.

    • Adequate replication requires sufficiently large sample sizes to detect treatment effects.

    • Blinding

    • Blinding: the subject does not know whether they are receiving treatment or placebo.

    • Purpose: counter the placebo effect, where untreated subjects report improvements due to expectations.

    • Double-Blind

    • Both the subject and the experimenter are unaware of whether the subject is receiving the treatment or placebo.

    • Randomization

    • Allocation of subjects to different groups is done by random selection.

    • Goal: create comparable groups by chance, reducing selection bias and confounding.

  • Definitions

    • Simple Random Sample (SRS)

    • Definition (text): A simple random sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen.

    • Characterization (text): In a random sample, members from the population are selected so that each individual member has an equal chance of being selected.

    • Formal definition (math):

    P(S=Sj)=1(Nn)P(S = S_j) = \frac{1}{\binom{N}{n}}

Example Problems

Problem 1: Calculating the Number of Possible Samples

Question: A small class has 10 students. If you want to select a simple random sample of 3 students to form a study group, how many different simple random samples are possible?

Solution:
To find the number of different simple random samples, we use the combination formula, where NN is the total population size and nn is the sample size.

(Nn)=N!n!(Nn)!{N \choose n} = \frac{N!}{n!(N-n)!}

Given: N=10N = 10 (total students), n=3n = 3 (students to be selected).

(103)=10!3!(103)!=10!3!7!{10 \choose 3} = \frac{10!}{3!(10-3)!} = \frac{10!}{3!7!}

=10×9×8×7×6×5×4×3×2×1(3×2×1)(7×6×5×4×3×2×1)= \frac{10 \times 9 \times 8 \times 7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1}{(3 \times 2 \times 1)(7 \times 6 \times 5 \times 4 \times 3 \times 2 \times 1)}

=10×9×83×2×1=10×3×4=120= \frac{10 \times 9 \times 8}{3 \times 2 \times 1} = 10 \times 3 \times 4 = 120

There are 120 different simple random samples of 3 students possible from a class of 10 students.

Problem 2: Identifying a Simple Random Sample

Question: A company has 500 employees. To conduct a survey, they assign a unique number from 1 to 500 to each employee. Then, they use a random number generator to select 50 numbers, and the employees corresponding to those numbers are surveyed. Is this method a simple random sample? Explain why or why not.

Solution:
Yes, this method results in a simple random sample.

Explanation:

  1. Equal Chance for Each Individual: By assigning a unique number to each employee and using a random number generator, every employee has an equal chance of being selected for the sample.

  2. Equal Chance for Each Sample as a Whole: More importantly, every possible combination (sample) of 50 employees has an equal chance of being chosen. This is because the selection process is purely random and does not restrict any particular group of 50 employees from being selected, nor does it favor any group. The selection of employees using random numbers ensures that all nn subjects are chosen in such a way that every possible sample of the same size nn has the same chance of being chosen, which aligns with the definition of a Simple Random Sample (SRS).