Notes on Sampling, Populations, Bias, and Research Design (Week on Samples and Bias)
Week focus: who, how, and what in social research
Emphasis on integrating the what and the how this week; moving from big-picture questions (what is research, epistemology, power, ethics) to the nitty-gritty of crafting a research design.
Core topics:
Samples and sampling
Population and recruitment
Bias in design and data collection
Instruments and questions in quantitative and qualitative methods
Applied example used throughout: the Hottest 100 of Australian songs
Triple J’s Hottest 100 (annual vote for top songs)
This year’s special focus: Australian songs of all time, tied to JJJ’s 50th birthday celebrations
Relevance: illustrates sampling, participation, and bias in real-world social research
Announcement of qualitative data collection approach
Shift from analysis of transcripts from mock focus groups to generating data via a short online qualitative survey
Purpose: experience the research process from both researcher and participant perspectives
In-class administration of the survey; anonymity maintained
Instructions to respondents: answer as if you were someone else (a friend, family member, housemate, colleague, or a fictional person)
Important caution: avoid basing responses on minoritized or racialized identities; focus on perspective-taking and understanding how different social positions influence responses
Rationale: practice perspective taking, observe response variability, and learn about survey design effects
Assessment changes announced
Four reflections now required instead of five
Reflections 1–2: usual short written reflections (existing submissions acknowledged)
Reflections 3–4: use a short template and a comment section; function as data-analysis checkpoints for major assignments; evaluated as complete/incomplete
Qualitative and quantitative reports: shortened to reduce workload; ethics sections removed because ethics content covered in Reflection 2
Help sessions and tutor support available for questions
Practical and ethical considerations emphasized
If questions arise, discuss during help sessions or with tutors; information will be clarified on Moodle
Key learning outcomes for this week
Understand social research terminology related to sampling: population, sample, recruitment
Distinguish probability vs. non-probability sampling and why it matters for generalizability and bias
Grasp how sampling design affects what you can say about data and how you interpret results
Apply ideas about effective question design in qualitative and quantitative methods
Recognize that research design is akin to a factory line: the input (sample) and the instrument (measurement) determine the outputs (findings and conclusions)
Develop critical skills to evaluate studies: ask who, where data came from, and what questions were asked; statistics can be made to say many things
Realize the relevance of sampling design for assignments and how it shapes data discussion in reports
Analogy used to frame design decisions
The “sausage factory” metaphor: the machine (instrument) and input (sample) shape the sausage (results)
Acknowledgement that the metaphor is imperfect, but aims to emphasize how design decisions constrain conclusions
Core takeaway for research design practice
Sampling design is central to accuracy, bias reduction, and external validity; affects what conclusions are warranted
Researchers must aim for diversity and representation in samples to avoid skewed results
Even large samples do not guarantee representativeness if sampling methods are biased; bias can persist despite a big n
Law of large numbers: larger samples reduce random noise and produce more stable estimates, but bias can still distort results if the sample is not representative
Quick preview of next steps (without giving away content not covered in this transcript)
Deeper dive into sampling techniques and how they feed into confidence, margin of error, and generalizability (weeks five and six)
Population, Sample, and Sampling: Core Concepts
Population vs. sample
Population: the entire group or set of units under study
Sample: a subset drawn from the population for analysis
Data are the results of sampling from the population
Cake analogy: population = whole cake; sample = a slice of the cake
Quantitative vs. qualitative aims and logic
Qualitative: aims to develop understanding, meaning, and in-depth themes; tends toward depth over breadth
Quantitative: aims to generalize to the population, estimate prevalence, and infer likelihoods for the broader population
Both approaches require careful consideration of how the sample represents the population and how the questions are designed
Examples used to illustrate population and sampling
YouGov LGBTQIA+ UK study:
Population: all LGBTQIA+ people in the UK
Narrowed to those on the YouGov panel, aged 18+, living in the UK
Resulting sample: ~2,000 individuals
Hottest 100 Australian songs (recent event):
Population: all Australians living in the country with internet access
Sample: 2,650,000 votes (millions of participants)
Why sampling matters for validity
Proper sampling methods help determine how well the sample represents the population and whether results can generalize with known confidence
The design of who is invited, how they are recruited, and what questions are asked shapes the conclusions you can draw
This is critical when evaluating published studies or planning assignments
Sampling Logic: Probability vs. Non-Probability
Probability sampling (random procedures to enable generalization)
Goal: make inferences about the population; each unit has a known chance of selection
Key methods:
Simple random sampling: random selection from a sampling frame; equal probability for all units; ideally from a complete list (sampling frame)
Example: random numbers from an electoral roll or a complete list of phone numbers
Concept: each unit has probability
P( ext{selection}) = rac{1}{N}
Systematic sampling: select every k-th unit after a random start
Define sampling interval k and sample size n; N is the frame size
Procedure: pick a starting point, then select every k-th item
Example: every 200th number on a list
Stratified sampling: divide population into subgroups (strata) and sample within each subgroup
Proportional or disproportionate sampling within strata
Example: sample within states/territories, then sample within each state
Notation: population size N = sumh Nh; sample size n = sumh nh; within-stratum sampling occurs
Non-probability sampling (no random selection; used when frames are unavailable or for qualitative aims)
Types commonly discussed:
Snowball sampling: start with one participant, who recruits others; common in hard-to-reach or hidden populations
Convenience sampling: recruit who is easiest to reach
Purposive sampling: deliberately select informative cases or participants
Quota sampling: ensure certain quota characteristics, based on population proportions
Self-selection: participants opt in (e.g., online polls, social media invitations)
Implications:
These methods do not allow generalization to the population with known confidence
Useful for exploration, theory-building, or accessing hard-to-reach groups
In the Hottest 100 example, non-probability sampling was used (self-selection and likely snowball processes):
Participants chose to vote; sharing within networks likely amplified participation
Potential biases: self-selection bias, nonresponse bias, undercoverage bias
Why these sampling methods matter
Sampling bias undermines external validity (generalizability) and can threaten internal validity
In probability sampling, biases are minimized by giving everyone a known probability of selection and by using random selection
In non-probability sampling, bias is more likely and inferences to the broader population are more tentative
Bias in Sampling and How It Affects Inference
What is bias in research?
A systematic error in design, conduct, or interpretation that skews findings in a particular direction
Research bias can arise from the wording of questions, how data are collected, or how observations are recorded
Sampling bias (a form of selection bias)
Definition: the sample is not representative of the population due to the way participants are selected
Impacts: threatens external validity (generalizability) and, to some extent, internal validity (differences within the sample)
Especially problematic for quantitative research due to the reliance on probability sampling to generalize
Common types of bias in sampling (examples from the transcript)
Nonresponse bias: differences between respondents and nonrespondents
Exclusion bias: explicitly excluding certain groups from the sample
Coverage (under-coverage) bias: some groups are not represented in the sampling frame or sample
Self-selection bias: certain types of people are more likely to participate
Small numbers bias: too-small samples may give unstable or non-representative results
The Hottest 100 as a case study for bias
Self-selection bias: participants who voted may have stronger or more specific musical preferences
Nonresponse bias: those who did not vote may differ in important ways from voters
Undercoverage bias: demographic groups (e.g., by gender, age, state) may be misrepresented relative to the population
Even with a large sample, non-probability methods can produce biased results that do not reflect the population
Minimizing bias in sampling (two primary strategies for probabilistic studies)
Build a robust sampling frame: a comprehensive, accurate list from which to sample
Weighting: adjust the sample to better reflect the population
Weights compensate for over- or under-representation of groups in the sample
Common approach: compare sample distribution to census data and apply weights accordingly
Example weight calculation (conceptual):
wg = rac{Ng}{n_g} \
ext{Weighted estimate: } \ \ 0000Then data are adjusted by multiplying each response by its group weight: yi^{(weighted)} = w{g(i)} \, y_i
Important caveat: weighting is a statistical adjustment and relies on having reasonably accurate population (e.g., census) data and a correct grouping structure
Practical implications for research design and reporting
If sampling methods are biased, you cannot claim that results generalize to the population
Diversity and representation in the sample are crucial to reducing bias and enhancing validity
The size of the sample matters, but method and representativeness matter just as much (or more) for inferential claims
Law of large numbers helps explain why larger samples stabilize estimates, but it does not fix biased sampling
The Law of Large Numbers: Intuition for Large Samples
Intuition presented in the lecture
With many observations, extreme values tend to average out and the observed frequency approaches the true population probability
Analogy: online book reviews
Small sample (e.g., 20 reviews) may be unrepresentative; trust in the average is low
Large sample (e.g., 3,000 reviews) yields a more reliable estimate of the true average opinion
Application to the Hottest 100 example
2,650,000 votes provide a strong signal and reduce random noise in the top songs
However, due to self-selection and other non-probabilistic methods, the sample is not necessarily representative of all Australians
Therefore, even with a large sample, one should be cautious about generalizing to the entire nation
Important nuance
Large samples reduce random noise but do not eliminate bias from the sampling method itself
If the sampling method fails to capture the population fairly, the results remain biased despite the large n
How to Minimize Bias: Practical Toolkit
Robust sampling frame
Ensure the list of potential participants is comprehensive and up-to-date
A flawed frame can introduce coverage bias from the start
Weighting data
Use population benchmarks (e.g., census) to adjust the sample to reflect population proportions
Weighting helps compensate for over- or under-representation, improving external validity
Transparent reporting
Clearly document sampling method, response rates, and potential biases
Discuss limitations related to bias and generalizability in findings
Balance between depth and generalizability
Qualitative research prioritizes depth and understanding of processes; small, purposefully selected samples are common
Quantitative research prioritizes generalizability; large, probabilistic samples are preferred when feasible
Ethical and methodological safeguards
Be mindful of who is included or excluded; avoid inadvertent harm or misrepresentation
When instructing tasks (e.g., asking students to respond as fictional others), ensure clarity and ethical alignment
Connections to Real-World Practice and Broader Implications
Critical evaluation of research
Always ask: who was surveyed, where did data come from, what questions were asked, and what might bias these choices?
Recognize that statistics can be used to support multiple conclusions; understanding sampling design helps interpret whether results are genuinely representative
Relevance to coursework and professional practice
Sampling design informs what conclusions can be drawn in assignments and reports
Practical decisions (e.g., which sampling method to use) should align with research questions, resource constraints, and ethical considerations
Ethical and philosophical reflections
Perspective-taking in qualitative research requires careful handling of identities and lived experiences
Researchers must balance realism of representation with ethical safeguards around sensitive identities and communities
Summative takeaway for this week
Sampling decisions are foundational to research quality
Bias is a pervasive risk that requires deliberate planning, measurement, and adjustment
The design choices you make determine the scope and strength of your inferences
Quick References to Key Concepts (LaTeX-formatted)
Probability sampling aims to generalize to the population with known selection probabilities:
Simple random sampling: each unit has equal probability P( ext{selection}) = rac{1}{N}
Systematic sampling: sample every k-th unit after a random start
Stratified sampling: divide into strata and sample within strata; population relation: N =
\sum{h=1}^H Nh,
\ n = \sum{h=1}^H nh
Law of large numbers (conceptual formulation):
\lim{n \to \infty} \bar{X}n = \mu \text{ (with probability 1)}
Weighting for population representativeness (conceptual):
Group weights: wg = \frac{Ng}{n_g}
Weighted estimate: \hat{p}{weighted} = \frac{\sumg wg \cdot \hat{p}g}{\sumg wg}
Title
"Notes on Sampling, Populations, Bias, and Research Design (Week on Samples and Bias)"