Notes on Survey Sampling, Random Sampling, and Study Design (Lecture Transcript)

Opening point: huge difference in how people react based on beliefs (e.g., conservatives reacting to gun-control topics).
Real-world example of wording effects: a deep-voiced caller asking for someone with a generic-sounding affiliation ("police benevolent association") triggers an immediate negative reaction when the word "police" is heard. This shows how specific words or cues can act as triggers and influence responses.
Takeaway for surveys/interviews: wording can significantly affect participants’ willingness to engage and their responses; wording considerations are essential for validity and reliability.
Question raised in class: What is the difference between a survey and an interview, and what are their respective values, accuracy, and ways they are administered? (The lecturer prompts discussion on these distinctions.)

The instructor asks about the value, accuracy, and ability to obtain information from surveys vs. interviews, and how they are administered, inviting analysis of methodological choices.
Implication: choice between survey and interview affects data type (quantitative vs. qualitative), response rates, and potential biases.

Core idea: random sampling aims to avoid biases that arise from non-representative recruitment.
Illustrative scenario setup: study on sleep and memory asking whether sleep boosts memory; target population is college students aged 18–20 at Medgar Evers College.
Target: recruit at least 250 participants for the sleep study.
Recruitment plan in-class (hypothetical): students in the class act as recruiters; example given: Shamika at a corner of the campus would recruit participants.
Recruitment allocation: each of the 30 students in the class would recruit a portion of the total (250 participants).
- Calculation mentioned: each student would pull in
  $rac{250}{30} \,\approx\, 8.33$
  participants, i.e., about 8–9 participants per student.
Practical question raised: what types of people each student would recruit (bias toward younger crowds, friends, etc.).

Early sampling bias: relying on convenience (the people who are near Shamika, or who are friends) risks a non-representative sample.
Described sample composition if recruitment were informal:
- A large group of African Americans
- Spanish-speaking individuals from Puerto Rico and the Dominican Republic
- Central Americans
- One or two people from China or Europe
- A large group from West Africa (mostly French-speaking)
Problem highlighted: studying only 18-year-olds would not represent the 18–20 age range; even within 18–20, sleep patterns may differ by age (18 vs 19 vs 20).
Core conclusion: to obtain a representative group that reflects the bigger picture (the 5,000 students aged 18–20 at Medgar Evers College), recruiting must capture diversity across age within the range, ethnicity, language, and other relevant factors.
Practical implication: if the sample is not representative, study outcomes may not generalize to the broader population.

Proposed remedy: implement random sampling rather than relying on corner recruitment or personal networks.
Systematic/random sampling method described:
- Instead of recruiting based on personal networks, researchers should post participants at different campus locations.
- Approach every nth person: for example, every fifth person who walks by would be stopped for participation.
- Start point would be random; after the start, select every kth person with k = 5 (i.e., sampling interval of 5).
- If some potential participants are under 18, they would be eliminated due to the age criteria; otherwise, they would be included.
Rationale: this approach is more objective and less susceptible to personal biases, helping to preserve diversity and representativeness of the sample.
Summary: systematic/random sampling helps ensure the sample better represents the 18–20-year-old student population and reduces bias introduced by convenience or self-selection.

The lecturer introduces the term "metadata" in this context.
Transcript note on metadata: "This is an example of, I think, what we call metadata."
Provided definition (as stated in the transcript):
- "Metadata is where the researcher, they don't want to conduct research. They use researchers' outcomes and use research all the outcomes of all the different scientists to come up with a theory."
Important correction to note (for study design): what the speaker describes more closely aligns with what is commonly called a meta-analysis (the statistical synthesis of results from multiple studies). In this transcript, the term used is metadata, but the concept described aligns with meta-analysis.

Required sample size and per-recruiter calculation:
- Target sample size: $N = 250$ participants.
- Number of recruiters (class size): $n = 30$ .
- Participants per recruiter (approx): rac{N}{n} = rac{250}{30} \approx 8.33.
- Practical interpretation: each recruiter would need to recruit about 8–9 participants.
Systematic sampling interval example:
- Sampling interval: $k = 5 ext{ (every 5th person)}$ .
Age range for study eligibility: $18 \,\leq \, \text{age} \, \leq \, 20$ .
If excluding those outside eligibility, some data may be discarded as part of the sampling process.

External validity and generalizability depend on representative sampling; failure to diversify beyond convenience can threaten generalizability.
Ethical and practical implications:
- Fair inclusion across ethnicities, languages, and religious backgrounds when matching caseworkers with families.
- Avoid bias by ensuring that recruitment does not over-represent one subgroup (e.g., only 18-year-olds or only a single social/ethnic group).
Real-world relevance: the classroom scenario mirrors common field practices in social science and public health research, where researchers must think about how to recruit a sample that accurately reflects the target population.
Philosophical note: language, perception, and triggers in outreach materials affect who participates and how participants respond, raising questions about researcher reflexivity and measurement validity.

Survey vs. interview: different formats, implications for data type and quality; administration matters.
Trigger effects in wording: choice of topics and terms can influence participation and responses.
Random sampling: a method to reduce bias and improve representativeness by selecting participants through a systematic process (e.g., every nth person).
Representativeness: the need to capture diversity across age, ethnicity, language, and other relevant factors to reflect the larger population.
Systematic sampling details: use of a fixed interval (e.g., k = 5) with a random start to avoid bias.
Per-recruiter workload: practical calculation to distribute recruitment effort evenly across a class.
Metadata vs. meta-analysis: transcript touches on a concept akin to meta-analysis, which aggregates results across multiple studies to draw broader conclusions.
Practical example relevance: the sleep-memory study demonstrates how sampling design affects outcomes and their generalizability to a broader student population.