Survey Sampling: Comprehensive Notes (Video Transcript)

Course context and goals

  • Instructor: Thomas Lovely (not Andrew Spall); will teach the next three weeks on sampling and sampling aspects of surveys.
  • Connection to Andrew Spall: issues of how to get surveys that measure what you want and how you ask questions; next focus is how you choose people for surveys and how that affects analysis.
  • Data analysis plans: later weeks will cover data analysis with Insight (survey analyses and simple experimental design analyses); alternative tools include R and Stata.
  • Thomas background: medical statistics researcher and professor; specializes in sampling, fitting models to data from complex samples, and using sampling to measure expensive metrics on smaller groups; has written the survey software underneath Insight.
  • Logistics: lecture uses a discussion system (Ed) instead of Piazza; questions encouraged; instructor will pause to allow questions.

Why surveys and what a survey looks like

  • Structured data vs homogeneous data: surveys (and experiments) are structured; not all observations are the same.
  • Purpose of surveys: learn about populations by sampling a smaller group cost-effectively.
  • Key questions addressed by surveys: proportion in a population, trends over time, and associations between variables (e.g., personality vs views on human rights).
  • Population concepts:
    • Target population: the population we want to make estimates about.
    • Survey population: the people we could actually survey.
    • Sampling frame: a list of members of the survey population from which we sample.
    • Sample: the people we actually select.
    • Respondents: people in the sample who are also in the target population.
    • Sample design: how we go about choosing the sample (and may include sampling people not in the target population).
  • Practical points:
    • Frame errors and nonresponse can bias results even if the sampling method is unbiased in principle.
    • Some data are collected by organizations and made available for analysis (government surveys in the US, New Zealand data, etc.).
  • Core tasks when analyzing survey data:
    • Define the survey structure to the computer so analyses reflect the design.
    • Use software that understands survey design (Insight, R with survey package, Stata).
    • Design details to report: stratification (if any), clustering (if any), and sampling weights.
  • Note on causality:
    • Surveys are usually observational; correlations do not imply causation.
    • Special cases exist (e.g., AB testing, or experiments embedded in surveys) where causal inferences are more reliable.

The NZ statistical geography and its relevance to sampling

  • NZ geography is hierarchical and designed to support sampling and data reporting:
    • Regions: 16 regions.
    • Territorial authorities: 67 total (53 districts).
    • Statistical geography units:
    • SA1: about 100–200 people.
    • SA2: a few thousand people.
    • Mesh blocks: the smallest basic geographic units; designed to be stable over time (unlike dwellings, which change).
    • Regions are subdivided into urban/rural strata for sampling; mesh blocks are nested within SA1/SA2 and within territorial authorities, regions, etc.
  • Nested structure and reporting:
    • Every point in NZ is in one mesh block, one SA1, one SA2, one TA, and one region.
    • Data reporting uses SA1, SA2, and mesh blocks to balance confidentiality and geographic specificity.
  • Additional geographical constructs:
    • Electorates exist but are not perfectly nested with SA1/SA2; electoral apportionment can cut across these units.
  • Practical note: when planning surveys, one typically aims to cover all regions to ensure geographic representativeness; omitting a region (e.g., Waikato) could bias national estimates.

Defining populations and sampling frames in practice

  • Target population vs survey population: need precise definitions to avoid misalignment.
    • Examples of well-defined targets: New Zealand adults 18+; or a more refined group like a sub-population of interest.
    • Edge cases to consider: adults physically in NZ vs citizens; residents with certain visas; international students; tourists.
  • Sampling frame quality matters:
    • A good frame (e.g., electoral roll) makes sampling practical but may include ineligible individuals (non-citizens in NZ roll).
    • Frame errors (missing people or including ineligible people) lead to biases if not properly adjusted.
  • Sample, respondents and target population alignment:
    • The sample may include people who are outside the intended target population; analysts must adjust and/or weigh to reflect the true population of interest.
  • Edge cases and practical issues:
    • In NZ, enrollment in the electoral roll is compulsory but enforcement varies; this affects coverage.
    • The concept of who counts as a New Zealand adult can affect the generalizability of results.
    • Nonresponse (people not answering calls, surveys, etc.) biases must be addressed via weighting or follow-ups.

Sampling frames and a concrete NZ example: Mission On (nutrition/physical activity intervention)

  • Mission On structure (2008–2009) overview:
    • Aim: evaluate a government initiative to improve nutrition and physical activity among youth.
    • Regions and stratification: 16 regions × urban vs rural (32 strata).
    • Sampling frame: mesh blocks within each stratum were sampled; mesh blocks with more dwellings had higher selection probability (PPS by dwellings).
    • Within each selected mesh block: households sampled with a systematic design (start at SW corner; select every nth house, e.g., third on the left and seventh on the right).
    • Within each selected household: one person selected (often the one with the most recent birthday).
  • Design and terminology in this example:
    • Population is subdivided into strata (regions × urban/rural).
    • Within strata, mesh blocks are the clusters (PSUs, the first-stage sampling units).
    • Not all mesh blocks are sampled; only a subset is selected; then within those blocks, a subset of dwellings is surveyed.
    • Within households, only one person is surveyed, creating another stage of sampling (multi-stage design).
  • Rationale and implications:
    • The approach reduces travel costs and makes fieldwork feasible for in-person interviews.
    • You must generalize from sampled mesh blocks to the rest of the country, which introduces design effects due to clustering.
    • Stratification increases precision and ensures geographic representation; clusters reduce cost but introduce intracluster correlation.
    • Sampling weights arise from unequal probabilities of selection at each stage and must be used in analysis for unbiased population estimates.

Key sampling design concepts: stratification, clustering, and multi-stage designs

  • Stratified sampling (strata):
    • The population is cut into non-overlapping strata (e.g., regions; urban/rural within regions).
    • Sampling can be performed within each stratum; common motivations:
    • Ensure representation of important subgroups.
    • Improve precision by reducing within-stratum variance.
    • Enable region- or subgroup-specific analyses.
    • Effects of ignoring stratification: standard errors can be too large or estimates biased if strata represent different populations (e.g., rural vs urban).
  • Clustering (clusters):
    • Units are grouped into clusters (e.g., mesh blocks, schools, classes).
    • Sampling occurs by selecting some clusters and then sampling within them.
    • Rationale:
    • Cost savings, reduced travel, feasible data collection when frames are incomplete or hard to enumerate.
    • Trade-offs:
    • Intraclass correlation means units within a cluster resemble each other; effective sample size is smaller than the raw sample size.
    • Potential unequal selection probabilities across clusters; bias can occur if not accounted for.
  • Combined designs (stratified and clustered):
    • Common in practice (e.g., Mission On used stratification at the region level and multi-stage cluster sampling within regions).
    • The first-stage sampling unit is the PSU (primary sampling unit); additional stages include within-PSU clusters.
  • Primary sampling units (PSUs):
    • The largest units actually sampled in a multi-stage design.
    • In Mission On, mesh blocks were the PSUs.
    • If you have no clustering, the PSU concept reduces to the individual unit.
  • Multi-stage designs and terminology:
    • Stage 1: choose PSUs (e.g., mesh blocks).
    • Stage 2: sample clusters within PSUs (e.g., households within blocks).
    • Stage 3+: sample individuals within clusters (e.g., one person per household).
    • In survey literature, stage numbers can be described differently depending on context (multilevel modeling uses a different orientation for levels).

Probabilities, weights, and analysis with complex samples

  • Inclusion probabilities and weights:
    • For a unit i, the overall inclusion probability is the product of stage-level probabilities:
      π<em>i=</em>t=1Tπi,t.\pi<em>i = \prod</em>{t=1}^T \pi_{i,t}.
    • The survey weight is the inverse of this inclusion probability:
      w<em>i=1π</em>i.w<em>i = \frac{1}{\pi</em>i}.
  • Example structure (Mission On-like):
    • Probability of selecting a mesh block j within a region is proportional to its number of dwellings: π<em>jD</em>j,or π<em>j=D</em>j<em>kregionD</em>k.\pi<em>j \propto D</em>j,\quad \text{or } \pi<em>j = \frac{D</em>j}{\sum<em>{k\in region} D</em>k}.
    • Probability of selecting a household within a chosen mesh block is: π<em>hhj=n</em>hhjH<em>j,\pi<em>{hh|j} = \frac{n</em>{hh|j}}{H<em>j}, where $n{hh|j}$ is the number of households sampled in block j and $H_j$ is the total households in block j.
    • Probability of selecting a person within a selected household: π<em>phh=1N</em>phh,\pi<em>{p|hh} = \frac{1}{N</em>{p|hh}}, often N<em>phh=number of people in the household.N<em>{p|hh}=\text{number of people in the household}. For one person per household, this is 1/($N{p|hh}$).
    • Overall inclusion probability for a person i in the survey: π<em>i=π</em>j×π<em>hhj×π</em>phh.\pi<em>i = \pi</em>j \times \pi<em>{hh|j} \times \pi</em>{p|hh}.
  • Practical impacts for analysis:
    • If PSUs are ignored, standard errors are biased; if weights are ignored or mis-specified, point estimates can be biased.
    • Modern software (Insight, R survey package, Stata svy) can handle these designs if you correctly specify strata, clusters, and weights.
    • The design effect (DEFF) quantifies the efficiency loss due to clustering; effective sample size is reduced: neff=nDEFF.n_{eff} = \frac{n}{DEFF}.

Simple random sampling vs complex sampling

  • Simple random sampling (SRS) without replacement:
    • Each unit has equal probability of being selected; all possible samples are equally likely.
    • Requires a complete list (sampling frame) of units; Random digit dialing is an alternative implicit frame for telephone surveys.
    • Practical limits: becomes costly if sampling a large fraction; generally not used for large-scale surveys due to cost and travel when in-person.
  • Limitations of SRS in practice:
    • If you have extra information (e.g., want better representation of ethnic groups), SRS may be inefficient; oversampling may be used to ensure adequate representation within subgroups.
    • Frames may be incomplete or outdated; nonresponse can still bias results.

Why stratification and clustering decisions matter in practice

  • Stratified sampling advantages:
    • Protects against bad samples (e.g., missing a region like Auckland); ensures representation across key subpopulations.
    • Facilitates reporting summaries for each stratum (e.g., region-level estimates).
    • Increases precision by reducing variance within strata; allows different sampling methods per stratum.
    • Example: Scottish Household Survey uses dense-area strata for cities and whole towns in Highlands for sparsely populated areas.
  • Consequences of ignoring stratification:
    • Standard errors may be inflated or deflated inappropriately; estimates may be biased if strata differ meaningfully.
  • Clustering advantages and trade-offs:
    • Cost and practicality: easier to collect data in a few locations (PSUs) rather than many scattered sites.
    • Does not require a complete frame at the individual level; can work with a frame at a higher level (e.g., villages, schools).
    • Intra-cluster correlation reduces the information gained from each additional unit within the same cluster; effective sample size is smaller than the nominal sample size.
    • Risk of unequal sampling probabilities across clusters; can bias estimates if clusters differ systematically in key variables.
  • When to use both stratification and clustering:
    • Very common in practice and often the most cost-effective design; strata help ensure representativeness, clusters reduce fieldwork costs.

A few additional examples of survey designs

  • Cannabis referendum (hypothetical example): stratified by Maori descent (Maori vs non-M Maori) and age (>50 vs ≤50); sample size of 1000 individuals per cell in a 2x2 design (Maori/non-Maori × age groups).
  • School vaccination attitudes: two-stage cluster sampling; select 100 schools; survey all students in three randomly chosen classes per school.
  • School Health Survey: stratified by school type (co-educational, boys, girls); clusters are schools; within each school, sample three classes; survey all students in those classes.
  • Malawi Food Production and Security survey: strata are districts; clusters are villages; PSUs are villages; within each village sample 30 households; within households survey all residents or selected members.
  • Practical point: in high-framing conditions, clustering may be necessary even when stratification is desirable; in low-framing contexts, stratification alone may suffice.

Why we need to use survey-aware software and proper design reporting

  • Modern software supports complex survey designs, but only if you provide accurate design information:
    • Strata variables (if any);
    • Clusters / PSUs (if any);
    • Sampling weights (from inclusion probabilities).
  • If you mis-specify the design (wrong strata, wrong clustering, wrong weights), you will get biased results.
  • Without proper design, analysis may resemble simple random sampling and ignore design effects, leading to misleading confidence intervals and p-values.

Practicalities and ethical considerations in survey sampling

  • Nonresponse and coverage bias are persistent concerns across survey methods; weighting and follow-ups are standard tools to mitigate these biases.
  • Privacy and defensible definitions of population are important: precise definitions determine what you can generalize to; mismatches can mislead policy decisions.
  • Data accessibility varies by country; large datasets (e.g., US government surveys) exist, but access often requires permissions and data use agreements.
  • Ethical implications include ensuring representation of subpopulations, avoiding stigmatization by over- or under-sampling certain groups, and preserving respondent confidentiality in small-area reporting (confidentiality of SA1/SA2 data).

Recap: core concepts to remember for exams and practice

  • Core definitions: target population, survey population, sampling frame, sample, respondent, and sample design.
  • The NZ geographic hierarchy: regions → territorial authorities → SA1 → SA2 → mesh blocks; nested and hierarchical, with PSUs typically being the coarsest sampled units.
  • Stratified sampling vs cluster sampling:
    • Stratification: sample from every stratum; increases representativeness and precision.
    • Clustering: sample a subset of clusters; reduces travel and logistical burden; introduces design effects and intra-cluster correlation.
  • Primary sampling units (PSUs) and stages: the largest units sampled first; subsequent stages sample within PSUs; multi-stage designs are common in surveys.
  • Inclusion probabilities and weights: overall probability is the product of stage probabilities, and weights are the inverse of that product; proper weighting is essential for unbiased population estimates.
  • Analyzing survey data requires software aware of the design; mis-specification leads to biased results.
  • Causality in surveys is limited; AB testing and embedded experiments can provide stronger causal evidence in some cases.
  • Real-world trade-offs: cost vs precision; larger clusters reduce fieldwork but can reduce precision; oversampling, frame quality, and nonresponse must be managed.