Survey Sampling: Comprehensive Notes (Video Transcript)

Course context and goals

Instructor: Thomas Lovely (not Andrew Spall); will teach the next three weeks on sampling and sampling aspects of surveys.
Connection to Andrew Spall: issues of how to get surveys that measure what you want and how you ask questions; next focus is how you choose people for surveys and how that affects analysis.
Data analysis plans: later weeks will cover data analysis with Insight (survey analyses and simple experimental design analyses); alternative tools include R and Stata.
Thomas background: medical statistics researcher and professor; specializes in sampling, fitting models to data from complex samples, and using sampling to measure expensive metrics on smaller groups; has written the survey software underneath Insight.
Logistics: lecture uses a discussion system (Ed) instead of Piazza; questions encouraged; instructor will pause to allow questions.

Why surveys and what a survey looks like

Structured data vs homogeneous data: surveys (and experiments) are structured; not all observations are the same.
Purpose of surveys: learn about populations by sampling a smaller group cost-effectively.
Key questions addressed by surveys: proportion in a population, trends over time, and associations between variables (e.g., personality vs views on human rights).
Population concepts:
- Target population: the population we want to make estimates about.
- Survey population: the people we could actually survey.
- Sampling frame: a list of members of the survey population from which we sample.
- Sample: the people we actually select.
- Respondents: people in the sample who are also in the target population.
- Sample design: how we go about choosing the sample (and may include sampling people not in the target population).
Practical points:
- Frame errors and nonresponse can bias results even if the sampling method is unbiased in principle.
- Some data are collected by organizations and made available for analysis (government surveys in the US, New Zealand data, etc.).
Core tasks when analyzing survey data:
- Define the survey structure to the computer so analyses reflect the design.
- Use software that understands survey design (Insight, R with survey package, Stata).
- Design details to report: stratification (if any), clustering (if any), and sampling weights.
Note on causality:
- Surveys are usually observational; correlations do not imply causation.
- Special cases exist (e.g., AB testing, or experiments embedded in surveys) where causal inferences are more reliable.

The NZ statistical geography and its relevance to sampling

NZ geography is hierarchical and designed to support sampling and data reporting:
- Regions: 16 regions.
- Territorial authorities: 67 total (53 districts).
- Statistical geography units:
- SA1: about 100–200 people.
- SA2: a few thousand people.
- Mesh blocks: the smallest basic geographic units; designed to be stable over time (unlike dwellings, which change).
- Regions are subdivided into urban/rural strata for sampling; mesh blocks are nested within SA1/SA2 and within territorial authorities, regions, etc.
Nested structure and reporting:
- Every point in NZ is in one mesh block, one SA1, one SA2, one TA, and one region.
- Data reporting uses SA1, SA2, and mesh blocks to balance confidentiality and geographic specificity.
Additional geographical constructs:
- Electorates exist but are not perfectly nested with SA1/SA2; electoral apportionment can cut across these units.
Practical note: when planning surveys, one typically aims to cover all regions to ensure geographic representativeness; omitting a region (e.g., Waikato) could bias national estimates.

Defining populations and sampling frames in practice

Target population vs survey population: need precise definitions to avoid misalignment.
- Examples of well-defined targets: New Zealand adults 18+; or a more refined group like a sub-population of interest.
- Edge cases to consider: adults physically in NZ vs citizens; residents with certain visas; international students; tourists.
Sampling frame quality matters:
- A good frame (e.g., electoral roll) makes sampling practical but may include ineligible individuals (non-citizens in NZ roll).
- Frame errors (missing people or including ineligible people) lead to biases if not properly adjusted.
Sample, respondents and target population alignment:
- The sample may include people who are outside the intended target population; analysts must adjust and/or weigh to reflect the true population of interest.
Edge cases and practical issues:
- In NZ, enrollment in the electoral roll is compulsory but enforcement varies; this affects coverage.
- The concept of who counts as a New Zealand adult can affect the generalizability of results.
- Nonresponse (people not answering calls, surveys, etc.) biases must be addressed via weighting or follow-ups.

Sampling frames and a concrete NZ example: Mission On (nutrition/physical activity intervention)

Mission On structure (2008–2009) overview:
- Aim: evaluate a government initiative to improve nutrition and physical activity among youth.
- Regions and stratification: 16 regions × urban vs rural (32 strata).
- Sampling frame: mesh blocks within each stratum were sampled; mesh blocks with more dwellings had higher selection probability (PPS by dwellings).
- Within each selected mesh block: households sampled with a systematic design (start at SW corner; select every nth house, e.g., third on the left and seventh on the right).
- Within each selected household: one person selected (often the one with the most recent birthday).
Design and terminology in this example:
- Population is subdivided into strata (regions × urban/rural).
- Within strata, mesh blocks are the clusters (PSUs, the first-stage sampling units).
- Not all mesh blocks are sampled; only a subset is selected; then within those blocks, a subset of dwellings is surveyed.
- Within households, only one person is surveyed, creating another stage of sampling (multi-stage design).
Rationale and implications:
- The approach reduces travel costs and makes fieldwork feasible for in-person interviews.
- You must generalize from sampled mesh blocks to the rest of the country, which introduces design effects due to clustering.
- Stratification increases precision and ensures geographic representation; clusters reduce cost but introduce intracluster correlation.
- Sampling weights arise from unequal probabilities of selection at each stage and must be used in analysis for unbiased population estimates.

Key sampling design concepts: stratification, clustering, and multi-stage designs

Stratified sampling (strata):
- The population is cut into non-overlapping strata (e.g., regions; urban/rural within regions).
- Sampling can be performed within each stratum; common motivations:
- Ensure representation of important subgroups.
- Improve precision by reducing within-stratum variance.
- Enable region- or subgroup-specific analyses.
- Effects of ignoring stratification: standard errors can be too large or estimates biased if strata represent different populations (e.g., rural vs urban).
Clustering (clusters):
- Units are grouped into clusters (e.g., mesh blocks, schools, classes).
- Sampling occurs by selecting some clusters and then sampling within them.
- Rationale:
- Cost savings, reduced travel, feasible data collection when frames are incomplete or hard to enumerate.
- Trade-offs:
- Intraclass correlation means units within a cluster resemble each other; effective sample size is smaller than the raw sample size.
- Potential unequal selection probabilities across clusters; bias can occur if not accounted for.
Combined designs (stratified and clustered):
- Common in practice (e.g., Mission On used stratification at the region level and multi-stage cluster sampling within regions).
- The first-stage sampling unit is the PSU (primary sampling unit); additional stages include within-PSU clusters.
Primary sampling units (PSUs):
- The largest units actually sampled in a multi-stage design.
- In Mission On, mesh blocks were the PSUs.
- If you have no clustering, the PSU concept reduces to the individual unit.
Multi-stage designs and terminology:
- Stage 1: choose PSUs (e.g., mesh blocks).
- Stage 2: sample clusters within PSUs (e.g., households within blocks).
- Stage 3+: sample individuals within clusters (e.g., one person per household).
- In survey literature, stage numbers can be described differently depending on context (multilevel modeling uses a different orientation for levels).

Probabilities, weights, and analysis with complex samples

Inclusion probabilities and weights:
- For a unit i, the overall inclusion probability is the product of stage-level probabilities:
 $\pii = \prod{t=1}^T \pi_{i,t}.$
- The survey weight is the inverse of this inclusion probability:
 $wi = \frac{1}{\pii}.$
Example structure (Mission On-like):
- Probability of selecting a mesh block j within a region is proportional to its number of dwellings: $\pij \propto Dj,\quad \text{or } \pij = \frac{Dj}{\sum{k\in region} Dk}.$
- Probability of selecting a household within a chosen mesh block is: $\pi{hh|j} = \frac{n{hh|j}}{Hj},$ where $n{hh|j}$ is the number of households sampled in block j and $H_j$ is the total households in block j.
- Probability of selecting a person within a selected household: $\pi{p|hh} = \frac{1}{N{p|hh}},$ often $N{p|hh}=\text{number of people in the household}.$ For one person per household, this is 1/($N{p|hh}$).
- Overall inclusion probability for a person i in the survey: $\pii = \pij \times \pi{hh|j} \times \pi{p|hh}.$
Practical impacts for analysis:
- If PSUs are ignored, standard errors are biased; if weights are ignored or mis-specified, point estimates can be biased.
- Modern software (Insight, R survey package, Stata svy) can handle these designs if you correctly specify strata, clusters, and weights.
- The design effect (DEFF) quantifies the efficiency loss due to clustering; effective sample size is reduced: $n_{eff} = \frac{n}{DEFF}.$

Simple random sampling vs complex sampling

Simple random sampling (SRS) without replacement:
- Each unit has equal probability of being selected; all possible samples are equally likely.
- Requires a complete list (sampling frame) of units; Random digit dialing is an alternative implicit frame for telephone surveys.
- Practical limits: becomes costly if sampling a large fraction; generally not used for large-scale surveys due to cost and travel when in-person.
Limitations of SRS in practice:
- If you have extra information (e.g., want better representation of ethnic groups), SRS may be inefficient; oversampling may be used to ensure adequate representation within subgroups.
- Frames may be incomplete or outdated; nonresponse can still bias results.

Why stratification and clustering decisions matter in practice

Stratified sampling advantages:
- Protects against bad samples (e.g., missing a region like Auckland); ensures representation across key subpopulations.
- Facilitates reporting summaries for each stratum (e.g., region-level estimates).
- Increases precision by reducing variance within strata; allows different sampling methods per stratum.
- Example: Scottish Household Survey uses dense-area strata for cities and whole towns in Highlands for sparsely populated areas.
Consequences of ignoring stratification:
- Standard errors may be inflated or deflated inappropriately; estimates may be biased if strata differ meaningfully.
Clustering advantages and trade-offs:
- Cost and practicality: easier to collect data in a few locations (PSUs) rather than many scattered sites.
- Does not require a complete frame at the individual level; can work with a frame at a higher level (e.g., villages, schools).
- Intra-cluster correlation reduces the information gained from each additional unit within the same cluster; effective sample size is smaller than the nominal sample size.
- Risk of unequal sampling probabilities across clusters; can bias estimates if clusters differ systematically in key variables.
When to use both stratification and clustering:
- Very common in practice and often the most cost-effective design; strata help ensure representativeness, clusters reduce fieldwork costs.

A few additional examples of survey designs

Cannabis referendum (hypothetical example): stratified by Maori descent (Maori vs non-M Maori) and age (>50 vs ≤50); sample size of 1000 individuals per cell in a 2x2 design (Maori/non-Maori × age groups).
School vaccination attitudes: two-stage cluster sampling; select 100 schools; survey all students in three randomly chosen classes per school.
School Health Survey: stratified by school type (co-educational, boys, girls); clusters are schools; within each school, sample three classes; survey all students in those classes.
Malawi Food Production and Security survey: strata are districts; clusters are villages; PSUs are villages; within each village sample 30 households; within households survey all residents or selected members.
Practical point: in high-framing conditions, clustering may be necessary even when stratification is desirable; in low-framing contexts, stratification alone may suffice.

Why we need to use survey-aware software and proper design reporting

Modern software supports complex survey designs, but only if you provide accurate design information:
- Strata variables (if any);
- Clusters / PSUs (if any);
- Sampling weights (from inclusion probabilities).
If you mis-specify the design (wrong strata, wrong clustering, wrong weights), you will get biased results.
Without proper design, analysis may resemble simple random sampling and ignore design effects, leading to misleading confidence intervals and p-values.

Practicalities and ethical considerations in survey sampling

Nonresponse and coverage bias are persistent concerns across survey methods; weighting and follow-ups are standard tools to mitigate these biases.
Privacy and defensible definitions of population are important: precise definitions determine what you can generalize to; mismatches can mislead policy decisions.
Data accessibility varies by country; large datasets (e.g., US government surveys) exist, but access often requires permissions and data use agreements.
Ethical implications include ensuring representation of subpopulations, avoiding stigmatization by over- or under-sampling certain groups, and preserving respondent confidentiality in small-area reporting (confidentiality of SA1/SA2 data).

Recap: core concepts to remember for exams and practice

Core definitions: target population, survey population, sampling frame, sample, respondent, and sample design.
The NZ geographic hierarchy: regions → territorial authorities → SA1 → SA2 → mesh blocks; nested and hierarchical, with PSUs typically being the coarsest sampled units.
Stratified sampling vs cluster sampling:
- Stratification: sample from every stratum; increases representativeness and precision.
- Clustering: sample a subset of clusters; reduces travel and logistical burden; introduces design effects and intra-cluster correlation.
Primary sampling units (PSUs) and stages: the largest units sampled first; subsequent stages sample within PSUs; multi-stage designs are common in surveys.
Inclusion probabilities and weights: overall probability is the product of stage probabilities, and weights are the inverse of that product; proper weighting is essential for unbiased population estimates.
Analyzing survey data requires software aware of the design; mis-specification leads to biased results.
Causality in surveys is limited; AB testing and embedded experiments can provide stronger causal evidence in some cases.
Real-world trade-offs: cost vs precision; larger clusters reduce fieldwork but can reduce precision; oversampling, frame quality, and nonresponse must be managed.