09/03 Stat: Data and Statistical Thinking – Study Designs & Random Sampling
Longitudinal, Case-Control, and Cross-Sectional Studies
Based on the transcript, three study types are introduced:
Longitudinal studies: follow a group of participants over a period of time.
Case-control studies: compare a case group that has a particular attribute with a control group that does not have the attribute.
Cross-sectional studies: analyze a population at a specific point in time.
The transcript emphasizes that cohort studies are a type of longitudinal study, where you follow a group sharing a particular characteristic over time.
A distinction is made (and sometimes confused in the excerpt) between longitudinal and cross-sectional designs:
Longitudinal/cohort: follow the same participants over time to observe changes and outcomes.
Cross-sectional: fix the timepoint and observe several characteristics at that single time.
Example note from the speaker (he mentions a quick Google search for examples): the transcript uses examples to illustrate the difference between cross-sectional and longitudinal approaches, but does not provide precise details for every example.
Observational Studies: Why and What to Watch For
Observational studies are commonly used in science, medicine, and social sciences due to ethical or practical concerns that prevent traditional experiments.
In observational studies, researchers do not assign treatments or interventions; they observe and measure as events occur.
Risks inherent to observational studies include:
Confounding variables: extraneous variables that change with the exposure/tactor of interest, making it hard to attribute effects to the studied factor.
Observer bias: biases introduced by the observer's expectations or measurement process.
Note: The transcript emphasizes that these observational designs are chosen due to ethical or practical constraints, but they come with the trade-off of potential confounding and observer bias.
Confounding Variables
Definition (as described in the transcript): confounding occurs when other variables are changing together with the exposure or treatment, making it difficult to attribute observed effects to the exposure of interest.
Implication: confounding threatens the validity of causal inferences in non-randomized studies.
Related concept: observer bias, another source of bias in observational settings, arises from how data are collected or interpreted by observers.
Observed vs Designed Experiments
The transcript contrasts observational studies with designed (randomized) experiments.
Designed experiments involve deliberate manipulation of an independent variable (treatment) and random assignment of experimental units to treatment groups.
Observational study example discussed in the transcript:
A scenario involving dropping off toddlers at daycare and observing their reactions (e.g., facial expressions) without interfering with their day-to-day routine.
Conclusion given: this scenario is an observational study because the researcher is merely observing behavior without applying any treatment or manipulation.
Randomized Experiments
In randomized experiments, randomization is used to assign treatment conditions to experimental units, reducing bias and balancing confounding variables across groups.
Random number generators are commonly used to implement randomization.
Tools for randomization include:
Random number tables
Statistical software programs
Hardware/calculator-based methods (e.g., TI-84 calculators)
Random Number Generators and How to Use Them
The transcript states that random numbers can be generated using table forms, software, or built-in calculator functions.
TI-84 calculator example for generating random integers:
Access the randint function: math → PRB → randint.
Randint generates random integers in a specified range.
Procedure described in the transcript:
Lower value: the smallest number in the range (included).
Upper value: the largest number in the range (included).
Number of random numbers to generate: the sample size of random numbers you want.
Usage format (on the TI-84): lowervalue , uppervalue , numberofrandom_numbers.
Step-by-step interaction described:
The calculator screen shows randInt with a blank parentheses.
Enter the lower bound, press the comma key (the comma is located above the 7 key on older models).
Enter the upper bound, press comma again.
Enter the number of random numbers desired, press Enter.
Older calculators required numbers to be separated by a comma; newer calculators display the numbers in a list.
Conceptual takeaway: random number generators are used to assign treatments or to select samples with a known and controllable randomness
Simple Random Sample and Sampling Frame
Simple random sampling (SRS) principle described in the transcript:
Each member of the population has an equal chance of being selected.
A sampling frame is created by listing every member (unit) of the population from which the sample will be drawn.
In the example provided:
The neighborhood near a proposed high school site has seven eleven homes (as stated).
A simple random sample of 20 households is to be selected from this neighborhood.
A random starting point is used (in the transcript, the starting number was two), and random numbers are generated to select households.
Important operational notes from the transcript:
The sampling frame is the complete list of units from which samples will be drawn.
Randomization ensures each unit has an equal probability of selection, aligning with the concept of SRS.
The transcript ends with an incomplete sentence after mentioning the random starting point, indicating that this portion was cut off mid-example.
Key formulas and concepts to remember:
Probability a given unit is selected in a simple random sample of size n from a population of size N:
P( ext{unit selected}) = \frac{n}{N}Number of possible simple random samples (without replacement) of size n from N units:
ext{Number of samples} = \binom{N}{n}In SRS without replacement, the total number of possible samples is \binom{N}{n}, and each specific sample has probability 1/\binom{N}{n} of being chosen.
Practical implications:
A proper sampling frame and randomization help achieve representative samples and reduce selection bias.
When selecting a sample, ensure that each unit has an equal and known chance of inclusion to support valid inferences.
The transcript emphasizes the workflow: define population, create sampling frame, use a random mechanism to select the sample, and then analyze results with awareness of potential biases and limitations.
Connections to Foundational Principles and Real-World Relevance
Foundational concepts linked:
Randomization as a core method to reduce bias and balance confounding variables across treatment groups.
The ethical and practical constraints that lead researchers to prefer observational designs when experiments are not feasible or ethical.
The use of sampling frames and simple random sampling as fundamental tools for representativeness in survey research.
Real-world relevance:
Planning studies for public projects (e.g., gauging community opinions near a proposed site) using randomized samples improves the reliability of conclusions.
The choice between observational studies and randomized experiments depends on the feasibility, ethics, and potential impact of interventions.
Practical implications:
Observational designs require careful attention to confounding and observer bias, and may necessitate statistical methods to adjust for confounders.
Randomized designs rely on proper randomization procedures and adequate sample sizes to achieve sufficient power.
Ethical considerations:
Observational studies may be favored when intervening could harm participants or alter natural behavior.
Even in randomized settings, ethical oversight and informed consent are essential when interventions affect participants.
{}\text{Important formulas and notation from the transcript and standard practice:}
Probability of a unit being selected in a simple random sample of size n from N units:
P(\text{unit selected}) = \frac{n}{N}Number of possible simple random samples (without replacement) of size n:
\binom{N}{n}If sampling with replacement, the number of possible samples would be \;N^n\; (note: this is not the focus of the transcript, which emphasizes simple random sampling without replacement).
Sampling frame: a complete list of individuals or items from which the sample is drawn.
(Note: The transcript contains a few statements that are slightly muddled or cut off, such as a sentence about the random starting number and an incomplete ending about the sampling frame. The notes above preserve the core ideas and the explicit steps described, while clarifying where the transcript shows ambiguity.)