Notes on Sampling, Populations, Bias, and Research Design (Week on Samples and Bias)

Week focus: who, how, and what in social research

  • Emphasis on integrating the what and the how this week; moving from big-picture questions (what is research, epistemology, power, ethics) to the nitty-gritty of crafting a research design.

  • Core topics:

    • Samples and sampling

    • Population and recruitment

    • Bias in design and data collection

    • Instruments and questions in quantitative and qualitative methods

  • Applied example used throughout: the Hottest 100 of Australian songs

    • Triple J’s Hottest 100 (annual vote for top songs)

    • This year’s special focus: Australian songs of all time, tied to JJJ’s 50th birthday celebrations

    • Relevance: illustrates sampling, participation, and bias in real-world social research

  • Announcement of qualitative data collection approach

    • Shift from analysis of transcripts from mock focus groups to generating data via a short online qualitative survey

    • Purpose: experience the research process from both researcher and participant perspectives

    • In-class administration of the survey; anonymity maintained

    • Instructions to respondents: answer as if you were someone else (a friend, family member, housemate, colleague, or a fictional person)

    • Important caution: avoid basing responses on minoritized or racialized identities; focus on perspective-taking and understanding how different social positions influence responses

    • Rationale: practice perspective taking, observe response variability, and learn about survey design effects

  • Assessment changes announced

    • Four reflections now required instead of five

    • Reflections 1–2: usual short written reflections (existing submissions acknowledged)

    • Reflections 3–4: use a short template and a comment section; function as data-analysis checkpoints for major assignments; evaluated as complete/incomplete

    • Qualitative and quantitative reports: shortened to reduce workload; ethics sections removed because ethics content covered in Reflection 2

    • Help sessions and tutor support available for questions

  • Practical and ethical considerations emphasized

    • If questions arise, discuss during help sessions or with tutors; information will be clarified on Moodle

  • Key learning outcomes for this week

    • Understand social research terminology related to sampling: population, sample, recruitment

    • Distinguish probability vs. non-probability sampling and why it matters for generalizability and bias

    • Grasp how sampling design affects what you can say about data and how you interpret results

    • Apply ideas about effective question design in qualitative and quantitative methods

    • Recognize that research design is akin to a factory line: the input (sample) and the instrument (measurement) determine the outputs (findings and conclusions)

    • Develop critical skills to evaluate studies: ask who, where data came from, and what questions were asked; statistics can be made to say many things

    • Realize the relevance of sampling design for assignments and how it shapes data discussion in reports

  • Analogy used to frame design decisions

    • The “sausage factory” metaphor: the machine (instrument) and input (sample) shape the sausage (results)

    • Acknowledgement that the metaphor is imperfect, but aims to emphasize how design decisions constrain conclusions

  • Core takeaway for research design practice

    • Sampling design is central to accuracy, bias reduction, and external validity; affects what conclusions are warranted

    • Researchers must aim for diversity and representation in samples to avoid skewed results

    • Even large samples do not guarantee representativeness if sampling methods are biased; bias can persist despite a big n

    • Law of large numbers: larger samples reduce random noise and produce more stable estimates, but bias can still distort results if the sample is not representative

  • Quick preview of next steps (without giving away content not covered in this transcript)

    • Deeper dive into sampling techniques and how they feed into confidence, margin of error, and generalizability (weeks five and six)

Population, Sample, and Sampling: Core Concepts

  • Population vs. sample

    • Population: the entire group or set of units under study

    • Sample: a subset drawn from the population for analysis

    • Data are the results of sampling from the population

    • Cake analogy: population = whole cake; sample = a slice of the cake

  • Quantitative vs. qualitative aims and logic

    • Qualitative: aims to develop understanding, meaning, and in-depth themes; tends toward depth over breadth

    • Quantitative: aims to generalize to the population, estimate prevalence, and infer likelihoods for the broader population

    • Both approaches require careful consideration of how the sample represents the population and how the questions are designed

  • Examples used to illustrate population and sampling

    • YouGov LGBTQIA+ UK study:

    • Population: all LGBTQIA+ people in the UK

    • Narrowed to those on the YouGov panel, aged 18+, living in the UK

    • Resulting sample: ~2,000 individuals

    • Hottest 100 Australian songs (recent event):

    • Population: all Australians living in the country with internet access

    • Sample: 2,650,000 votes (millions of participants)

  • Why sampling matters for validity

    • Proper sampling methods help determine how well the sample represents the population and whether results can generalize with known confidence

    • The design of who is invited, how they are recruited, and what questions are asked shapes the conclusions you can draw

    • This is critical when evaluating published studies or planning assignments

Sampling Logic: Probability vs. Non-Probability

  • Probability sampling (random procedures to enable generalization)

    • Goal: make inferences about the population; each unit has a known chance of selection

    • Key methods:

    • Simple random sampling: random selection from a sampling frame; equal probability for all units; ideally from a complete list (sampling frame)

      • Example: random numbers from an electoral roll or a complete list of phone numbers

      • Concept: each unit has probability
        P( ext{selection}) = rac{1}{N}

    • Systematic sampling: select every k-th unit after a random start

      • Define sampling interval k and sample size n; N is the frame size

      • Procedure: pick a starting point, then select every k-th item

      • Example: every 200th number on a list

    • Stratified sampling: divide population into subgroups (strata) and sample within each subgroup

      • Proportional or disproportionate sampling within strata

      • Example: sample within states/territories, then sample within each state

      • Notation: population size N = sumh Nh; sample size n = sumh nh; within-stratum sampling occurs

  • Non-probability sampling (no random selection; used when frames are unavailable or for qualitative aims)

    • Types commonly discussed:

    • Snowball sampling: start with one participant, who recruits others; common in hard-to-reach or hidden populations

    • Convenience sampling: recruit who is easiest to reach

    • Purposive sampling: deliberately select informative cases or participants

    • Quota sampling: ensure certain quota characteristics, based on population proportions

    • Self-selection: participants opt in (e.g., online polls, social media invitations)

    • Implications:

    • These methods do not allow generalization to the population with known confidence

    • Useful for exploration, theory-building, or accessing hard-to-reach groups

    • In the Hottest 100 example, non-probability sampling was used (self-selection and likely snowball processes):

    • Participants chose to vote; sharing within networks likely amplified participation

    • Potential biases: self-selection bias, nonresponse bias, undercoverage bias

  • Why these sampling methods matter

    • Sampling bias undermines external validity (generalizability) and can threaten internal validity

    • In probability sampling, biases are minimized by giving everyone a known probability of selection and by using random selection

    • In non-probability sampling, bias is more likely and inferences to the broader population are more tentative

Bias in Sampling and How It Affects Inference

  • What is bias in research?

    • A systematic error in design, conduct, or interpretation that skews findings in a particular direction

    • Research bias can arise from the wording of questions, how data are collected, or how observations are recorded

  • Sampling bias (a form of selection bias)

    • Definition: the sample is not representative of the population due to the way participants are selected

    • Impacts: threatens external validity (generalizability) and, to some extent, internal validity (differences within the sample)

    • Especially problematic for quantitative research due to the reliance on probability sampling to generalize

  • Common types of bias in sampling (examples from the transcript)

    • Nonresponse bias: differences between respondents and nonrespondents

    • Exclusion bias: explicitly excluding certain groups from the sample

    • Coverage (under-coverage) bias: some groups are not represented in the sampling frame or sample

    • Self-selection bias: certain types of people are more likely to participate

    • Small numbers bias: too-small samples may give unstable or non-representative results

  • The Hottest 100 as a case study for bias

    • Self-selection bias: participants who voted may have stronger or more specific musical preferences

    • Nonresponse bias: those who did not vote may differ in important ways from voters

    • Undercoverage bias: demographic groups (e.g., by gender, age, state) may be misrepresented relative to the population

    • Even with a large sample, non-probability methods can produce biased results that do not reflect the population

  • Minimizing bias in sampling (two primary strategies for probabilistic studies)

    • Build a robust sampling frame: a comprehensive, accurate list from which to sample

    • Weighting: adjust the sample to better reflect the population

    • Weights compensate for over- or under-representation of groups in the sample

    • Common approach: compare sample distribution to census data and apply weights accordingly

    • Example weight calculation (conceptual):
      wg = rac{Ng}{n_g} \
      ext{Weighted estimate: } \ \ 0000

      • Then data are adjusted by multiplying each response by its group weight: yi^{(weighted)} = w{g(i)} \, y_i

    • Important caveat: weighting is a statistical adjustment and relies on having reasonably accurate population (e.g., census) data and a correct grouping structure

  • Practical implications for research design and reporting

    • If sampling methods are biased, you cannot claim that results generalize to the population

    • Diversity and representation in the sample are crucial to reducing bias and enhancing validity

    • The size of the sample matters, but method and representativeness matter just as much (or more) for inferential claims

    • Law of large numbers helps explain why larger samples stabilize estimates, but it does not fix biased sampling

The Law of Large Numbers: Intuition for Large Samples

  • Intuition presented in the lecture

    • With many observations, extreme values tend to average out and the observed frequency approaches the true population probability

    • Analogy: online book reviews

    • Small sample (e.g., 20 reviews) may be unrepresentative; trust in the average is low

    • Large sample (e.g., 3,000 reviews) yields a more reliable estimate of the true average opinion

  • Application to the Hottest 100 example

    • 2,650,000 votes provide a strong signal and reduce random noise in the top songs

    • However, due to self-selection and other non-probabilistic methods, the sample is not necessarily representative of all Australians

    • Therefore, even with a large sample, one should be cautious about generalizing to the entire nation

  • Important nuance

    • Large samples reduce random noise but do not eliminate bias from the sampling method itself

    • If the sampling method fails to capture the population fairly, the results remain biased despite the large n

How to Minimize Bias: Practical Toolkit

  • Robust sampling frame

    • Ensure the list of potential participants is comprehensive and up-to-date

    • A flawed frame can introduce coverage bias from the start

  • Weighting data

    • Use population benchmarks (e.g., census) to adjust the sample to reflect population proportions

    • Weighting helps compensate for over- or under-representation, improving external validity

  • Transparent reporting

    • Clearly document sampling method, response rates, and potential biases

    • Discuss limitations related to bias and generalizability in findings

  • Balance between depth and generalizability

    • Qualitative research prioritizes depth and understanding of processes; small, purposefully selected samples are common

    • Quantitative research prioritizes generalizability; large, probabilistic samples are preferred when feasible

  • Ethical and methodological safeguards

    • Be mindful of who is included or excluded; avoid inadvertent harm or misrepresentation

    • When instructing tasks (e.g., asking students to respond as fictional others), ensure clarity and ethical alignment

Connections to Real-World Practice and Broader Implications

  • Critical evaluation of research

    • Always ask: who was surveyed, where did data come from, what questions were asked, and what might bias these choices?

    • Recognize that statistics can be used to support multiple conclusions; understanding sampling design helps interpret whether results are genuinely representative

  • Relevance to coursework and professional practice

    • Sampling design informs what conclusions can be drawn in assignments and reports

    • Practical decisions (e.g., which sampling method to use) should align with research questions, resource constraints, and ethical considerations

  • Ethical and philosophical reflections

    • Perspective-taking in qualitative research requires careful handling of identities and lived experiences

    • Researchers must balance realism of representation with ethical safeguards around sensitive identities and communities

  • Summative takeaway for this week

    • Sampling decisions are foundational to research quality

    • Bias is a pervasive risk that requires deliberate planning, measurement, and adjustment

    • The design choices you make determine the scope and strength of your inferences

Quick References to Key Concepts (LaTeX-formatted)

  • Probability sampling aims to generalize to the population with known selection probabilities:

    • Simple random sampling: each unit has equal probability P( ext{selection}) = rac{1}{N}

    • Systematic sampling: sample every k-th unit after a random start

    • Stratified sampling: divide into strata and sample within strata; population relation: N =
      \sum{h=1}^H Nh,
      \ n = \sum{h=1}^H nh

  • Law of large numbers (conceptual formulation):


    • \lim{n \to \infty} \bar{X}n = \mu \text{ (with probability 1)}

  • Weighting for population representativeness (conceptual):

    • Group weights: wg = \frac{Ng}{n_g}

    • Weighted estimate: \hat{p}{weighted} = \frac{\sumg wg \cdot \hat{p}g}{\sumg wg}

Title

"Notes on Sampling, Populations, Bias, and Research Design (Week on Samples and Bias)"