Notes on Sampling, Populations, Bias, and Research Design (Week on Samples and Bias)

Week focus: who, how, and what in social research

Emphasis on integrating the what and the how this week; moving from big-picture questions (what is research, epistemology, power, ethics) to the nitty-gritty of crafting a research design.
Core topics:
- Samples and sampling
- Population and recruitment
- Bias in design and data collection
- Instruments and questions in quantitative and qualitative methods
Applied example used throughout: the Hottest 100 of Australian songs
- Triple J’s Hottest 100 (annual vote for top songs)
- This year’s special focus: Australian songs of all time, tied to JJJ’s 50th birthday celebrations
- Relevance: illustrates sampling, participation, and bias in real-world social research
Announcement of qualitative data collection approach
- Shift from analysis of transcripts from mock focus groups to generating data via a short online qualitative survey
- Purpose: experience the research process from both researcher and participant perspectives
- In-class administration of the survey; anonymity maintained
- Instructions to respondents: answer as if you were someone else (a friend, family member, housemate, colleague, or a fictional person)
- Important caution: avoid basing responses on minoritized or racialized identities; focus on perspective-taking and understanding how different social positions influence responses
- Rationale: practice perspective taking, observe response variability, and learn about survey design effects
Assessment changes announced
- Four reflections now required instead of five
- Reflections 1–2: usual short written reflections (existing submissions acknowledged)
- Reflections 3–4: use a short template and a comment section; function as data-analysis checkpoints for major assignments; evaluated as complete/incomplete
- Qualitative and quantitative reports: shortened to reduce workload; ethics sections removed because ethics content covered in Reflection 2
- Help sessions and tutor support available for questions
Practical and ethical considerations emphasized
- If questions arise, discuss during help sessions or with tutors; information will be clarified on Moodle
Key learning outcomes for this week
- Understand social research terminology related to sampling: population, sample, recruitment
- Distinguish probability vs. non-probability sampling and why it matters for generalizability and bias
- Grasp how sampling design affects what you can say about data and how you interpret results
- Apply ideas about effective question design in qualitative and quantitative methods
- Recognize that research design is akin to a factory line: the input (sample) and the instrument (measurement) determine the outputs (findings and conclusions)
- Develop critical skills to evaluate studies: ask who, where data came from, and what questions were asked; statistics can be made to say many things
- Realize the relevance of sampling design for assignments and how it shapes data discussion in reports
Analogy used to frame design decisions
- The “sausage factory” metaphor: the machine (instrument) and input (sample) shape the sausage (results)
- Acknowledgement that the metaphor is imperfect, but aims to emphasize how design decisions constrain conclusions
Core takeaway for research design practice
- Sampling design is central to accuracy, bias reduction, and external validity; affects what conclusions are warranted
- Researchers must aim for diversity and representation in samples to avoid skewed results
- Even large samples do not guarantee representativeness if sampling methods are biased; bias can persist despite a big n
- Law of large numbers: larger samples reduce random noise and produce more stable estimates, but bias can still distort results if the sample is not representative
Quick preview of next steps (without giving away content not covered in this transcript)
- Deeper dive into sampling techniques and how they feed into confidence, margin of error, and generalizability (weeks five and six)

Population, Sample, and Sampling: Core Concepts

Population vs. sample
- Population: the entire group or set of units under study
- Sample: a subset drawn from the population for analysis
- Data are the results of sampling from the population
- Cake analogy: population = whole cake; sample = a slice of the cake
Quantitative vs. qualitative aims and logic
- Qualitative: aims to develop understanding, meaning, and in-depth themes; tends toward depth over breadth
- Quantitative: aims to generalize to the population, estimate prevalence, and infer likelihoods for the broader population
- Both approaches require careful consideration of how the sample represents the population and how the questions are designed
Examples used to illustrate population and sampling
- YouGov LGBTQIA+ UK study:
- Population: all LGBTQIA+ people in the UK
- Narrowed to those on the YouGov panel, aged 18+, living in the UK
- Resulting sample: ~2,000 individuals
- Hottest 100 Australian songs (recent event):
- Population: all Australians living in the country with internet access
- Sample: 2,650,000 votes (millions of participants)
Why sampling matters for validity
- Proper sampling methods help determine how well the sample represents the population and whether results can generalize with known confidence
- The design of who is invited, how they are recruited, and what questions are asked shapes the conclusions you can draw
- This is critical when evaluating published studies or planning assignments

Sampling Logic: Probability vs. Non-Probability

Probability sampling (random procedures to enable generalization)
- Goal: make inferences about the population; each unit has a known chance of selection
- Key methods:
- Simple random sampling: random selection from a sampling frame; equal probability for all units; ideally from a complete list (sampling frame)
  - Example: random numbers from an electoral roll or a complete list of phone numbers
  - Concept: each unit has probability
    $P( ext{selection}) = rac{1}{N}$
- Systematic sampling: select every k-th unit after a random start
  - Define sampling interval k and sample size n; N is the frame size
  - Procedure: pick a starting point, then select every k-th item
  - Example: every 200th number on a list
- Stratified sampling: divide population into subgroups (strata) and sample within each subgroup
  - Proportional or disproportionate sampling within strata
  - Example: sample within states/territories, then sample within each state
  - Notation: population size N = sumh Nh; sample size n = sumh nh; within-stratum sampling occurs
Non-probability sampling (no random selection; used when frames are unavailable or for qualitative aims)
- Types commonly discussed:
- Snowball sampling: start with one participant, who recruits others; common in hard-to-reach or hidden populations
- Convenience sampling: recruit who is easiest to reach
- Purposive sampling: deliberately select informative cases or participants
- Quota sampling: ensure certain quota characteristics, based on population proportions
- Self-selection: participants opt in (e.g., online polls, social media invitations)
- Implications:
- These methods do not allow generalization to the population with known confidence
- Useful for exploration, theory-building, or accessing hard-to-reach groups
- In the Hottest 100 example, non-probability sampling was used (self-selection and likely snowball processes):
- Participants chose to vote; sharing within networks likely amplified participation
- Potential biases: self-selection bias, nonresponse bias, undercoverage bias
Why these sampling methods matter
- Sampling bias undermines external validity (generalizability) and can threaten internal validity
- In probability sampling, biases are minimized by giving everyone a known probability of selection and by using random selection
- In non-probability sampling, bias is more likely and inferences to the broader population are more tentative

Bias in Sampling and How It Affects Inference

What is bias in research?
- A systematic error in design, conduct, or interpretation that skews findings in a particular direction
- Research bias can arise from the wording of questions, how data are collected, or how observations are recorded
Sampling bias (a form of selection bias)
- Definition: the sample is not representative of the population due to the way participants are selected
- Impacts: threatens external validity (generalizability) and, to some extent, internal validity (differences within the sample)
- Especially problematic for quantitative research due to the reliance on probability sampling to generalize
Common types of bias in sampling (examples from the transcript)
- Nonresponse bias: differences between respondents and nonrespondents
- Exclusion bias: explicitly excluding certain groups from the sample
- Coverage (under-coverage) bias: some groups are not represented in the sampling frame or sample
- Self-selection bias: certain types of people are more likely to participate
- Small numbers bias: too-small samples may give unstable or non-representative results
The Hottest 100 as a case study for bias
- Self-selection bias: participants who voted may have stronger or more specific musical preferences
- Nonresponse bias: those who did not vote may differ in important ways from voters
- Undercoverage bias: demographic groups (e.g., by gender, age, state) may be misrepresented relative to the population
- Even with a large sample, non-probability methods can produce biased results that do not reflect the population
Minimizing bias in sampling (two primary strategies for probabilistic studies)
- Build a robust sampling frame: a comprehensive, accurate list from which to sample
- Weighting: adjust the sample to better reflect the population
- Weights compensate for over- or under-representation of groups in the sample
- Common approach: compare sample distribution to census data and apply weights accordingly
- Example weight calculation (conceptual):
 wg = rac{Ng}{n_g} \
 ext{Weighted estimate: } \ \ 0000
 - Then data are adjusted by multiplying each response by its group weight: $yi^{(weighted)} = w{g(i)} \, y_i$
- Important caveat: weighting is a statistical adjustment and relies on having reasonably accurate population (e.g., census) data and a correct grouping structure
Practical implications for research design and reporting
- If sampling methods are biased, you cannot claim that results generalize to the population
- Diversity and representation in the sample are crucial to reducing bias and enhancing validity
- The size of the sample matters, but method and representativeness matter just as much (or more) for inferential claims
- Law of large numbers helps explain why larger samples stabilize estimates, but it does not fix biased sampling

The Law of Large Numbers: Intuition for Large Samples

Intuition presented in the lecture
- With many observations, extreme values tend to average out and the observed frequency approaches the true population probability
- Analogy: online book reviews
- Small sample (e.g., 20 reviews) may be unrepresentative; trust in the average is low
- Large sample (e.g., 3,000 reviews) yields a more reliable estimate of the true average opinion
Application to the Hottest 100 example
- 2,650,000 votes provide a strong signal and reduce random noise in the top songs
- However, due to self-selection and other non-probabilistic methods, the sample is not necessarily representative of all Australians
- Therefore, even with a large sample, one should be cautious about generalizing to the entire nation
Important nuance
- Large samples reduce random noise but do not eliminate bias from the sampling method itself
- If the sampling method fails to capture the population fairly, the results remain biased despite the large n

How to Minimize Bias: Practical Toolkit

Robust sampling frame
- Ensure the list of potential participants is comprehensive and up-to-date
- A flawed frame can introduce coverage bias from the start
Weighting data
- Use population benchmarks (e.g., census) to adjust the sample to reflect population proportions
- Weighting helps compensate for over- or under-representation, improving external validity
Transparent reporting
- Clearly document sampling method, response rates, and potential biases
- Discuss limitations related to bias and generalizability in findings
Balance between depth and generalizability
- Qualitative research prioritizes depth and understanding of processes; small, purposefully selected samples are common
- Quantitative research prioritizes generalizability; large, probabilistic samples are preferred when feasible
Ethical and methodological safeguards
- Be mindful of who is included or excluded; avoid inadvertent harm or misrepresentation
- When instructing tasks (e.g., asking students to respond as fictional others), ensure clarity and ethical alignment

Connections to Real-World Practice and Broader Implications

Critical evaluation of research
- Always ask: who was surveyed, where did data come from, what questions were asked, and what might bias these choices?
- Recognize that statistics can be used to support multiple conclusions; understanding sampling design helps interpret whether results are genuinely representative
Relevance to coursework and professional practice
- Sampling design informs what conclusions can be drawn in assignments and reports
- Practical decisions (e.g., which sampling method to use) should align with research questions, resource constraints, and ethical considerations
Ethical and philosophical reflections
- Perspective-taking in qualitative research requires careful handling of identities and lived experiences
- Researchers must balance realism of representation with ethical safeguards around sensitive identities and communities
Summative takeaway for this week
- Sampling decisions are foundational to research quality
- Bias is a pervasive risk that requires deliberate planning, measurement, and adjustment
- The design choices you make determine the scope and strength of your inferences

Quick References to Key Concepts (LaTeX-formatted)

Probability sampling aims to generalize to the population with known selection probabilities:
- Simple random sampling: each unit has equal probability $P( ext{selection}) = rac{1}{N}$
- Systematic sampling: sample every k-th unit after a random start
- Stratified sampling: divide into strata and sample within strata; population relation: $N = \sum{h=1}^H Nh, \ n = \sum{h=1}^H nh$
Law of large numbers (conceptual formulation):
- $ \lim{n \to \infty} \bar{X}n = \mu \text{ (with probability 1)} $
Weighting for population representativeness (conceptual):
- Group weights: $wg = \frac{Ng}{n_g}$
- Weighted estimate: $\hat{p}{weighted} = \frac{\sumg wg \cdot \hat{p}g}{\sumg wg}$

Title

"Notes on Sampling, Populations, Bias, and Research Design (Week on Samples and Bias)"