2. PPHS 501: Population Health & Epidemiology - Elements of Survey Design

Instructor Information

Instructor: Ananya Banerjee, PhD
Role: Assistant Professor, Equity, Diversity, Inclusion and Anti-Racism (EDIAR) Lead
Department: Epidemiology, Biostatistics and Occupational Health
Institution: School of Population and Global Health | McGill University

Exam Preparation Guidance

Primary Focus: Material from lecture slides will be the main content covered in the exam. Students are strongly encouraged to actively engage with the lecture material, perhaps by reviewing slides multiple times, to ensure a strong grasp of core concepts.
Readings: Do not need to memorize readings. Instead, focus on concepts that overlap with lecture material to deepen understanding. While readings provide supplementary context, the exam questions will directly assess knowledge from the lectures.

The Purpose of Surveys

Fundamental Role: Surveys are indispensable tools and among the most frequent modes of observation and measurement in health research. They provide a systematic approach to gather data from a defined population to infer characteristics, attitudes, or behaviors, allowing for broad data collection that might be impractical with other methods.
Descriptive Epidemiology Application: They utilize descriptive epidemiology to answer key questions:
- What are the health risks, diseases, or events? What is their frequency? Who is involved, where, and when? This involves understanding the distribution of health outcomes and determinants across various demographic, geographic, and temporal dimensions (e.g., identifying high-risk groups, geographical hotspots, or seasonal patterns).
- What is the rate of utilization for particular interventions? This could involve assessing how many people are using specific health programs, vaccines, or lifestyle interventions, and identifying barriers or facilitators to their uptake.
- What is the distribution and use of various health services? This includes understanding patterns of access to and engagement with primary care, specialized medical services, mental health support, and public health initiatives.

Why Surveys Are Useful (Examples)

**Estimating Incidence or Prevalence
:**
- Diseases: e.g., tracking the rate of new cases (incidence) or total existing cases (prevalence) of influenza outbreaks or chronic conditions like diabetes in a population at a specific time.
- Risk factors: e.g., determining the proportion of people engaging in certain health behaviors like smoking, physical inactivity, or unsafe sexual practices.

**Screening Population Groups
:** Identifying individuals for treatment of specific diseases, e.g., mass screenings for hypertension, diabetes, or certain cancers to facilitate early intervention.
**Collecting Health Information about Households
:** Gathering data on family health histories, access to clean water, nutritional status, and immunization coverage within a household context.
**Finding out about Local Beliefs and Health Practices
:** Understanding traditional healing methods, perceptions of illness, dietary customs, and attitudes towards modern medicine that influence health outcomes.
**Understanding the Uptake of New Interventions
:** Assessing how widely new treatments or programs, like a new vaccine or a public health campaign, are adopted by the target population.
**Evaluating the Effectiveness of Health Services
:** Measuring the impact and outcomes of healthcare provisions, such as patient satisfaction, reduction in disease burden, or improvements in quality of life attributable to specific services.
**Assessing Experiences of Discrimination
:** Quantifying the prevalence and impact of discriminatory experiences based on race, gender, socioeconomic status, or disability on health and healthcare access.

Elements of Survey Design

Importance: Understanding these elements is crucial for accurate population health measurements.
Objective: To obtain a representative sample of the target population. A representative sample is one that accurately reflects the characteristics of the larger target population from which it is drawn. This ensures that findings from the sample can be generalized back to the population with confidence, which is crucial for valid inferences in population health research.

Types of Error

Errors in data collection can be categorized into two main types:
- Random Error: Imprecision.
- Systematic Error: Bias.

Relationship to Measurement Qualities:
- No random error implies precision. Precision refers to the consistency or reproducibility of a measurement (i.e., how close repeated measurements are to each other).
- No systematic error implies validity. Validity refers to the accuracy of a measurement (i.e., how close it is to the true value or whether it truly measures what it intends to measure). Both precision and validity are essential for reliable and meaningful research findings.

Random Error

Definition: Errors in data that occur by chance; they are unpredictable and lead to imprecision. These errors introduce variability around the true value, but without a consistent direction.
Ubiquity: All data inherently contain random errors because no measurement (human or instrument-based) is perfectly accurate.
Examples:
- Minor variations in height measurements.
- Slight fluctuations in serum cholesterol readings.
- In survey data, even for variables like self-reported race, some individuals might mistakenly check the wrong box.
- Questions like those in food frequency questionnaires often involve more random error due to reliance on memory and estimation.

Decreasing Random Error: Generally achieved by increasing the sample size or repeating measurements multiple times. Larger samples tend to have random errors cancel each other out due to the law of large numbers, leading to a more stable estimate of the true population parameter. Repeating measurements also helps by allowing for averaging, which reduces the impact of individual random fluctuations.

Systematic Error (Bias)

Definition: Consistent, reproducible error that is not due to chance. This type of error introduces inaccuracy into measurements. This can lead to a consistent overestimation or underestimation of the true value, thereby distorting the research findings and potentially leading to incorrect conclusions.
Example: An improperly calibrated thermometer might give accurate readings within a normal temperature range but becomes consistently inaccurate at higher or lower temperatures.
Impact on Surveys: Systematic error (also known as sample bias) in questionnaire data reduces validity.
Validity Test: Systematic error critically compromises the ability to determine whether a measure is truly assessing the concept the researcher intends to measure, because the systematic deviation means the measurement is not tracking the true value accurately.
Avoiding Systematic Error: Requires careful survey design, unbiased instrument calibration, and rigorous methodology. This includes:
- Careful survey design: Ensuring questions are unambiguous, not leading, and culturally appropriate.
- Unbiased instrument calibration: Regular checks and adjustments of measurement tools.
- Rigorous methodology: Employing appropriate sampling techniques to minimize selection bias, using standardized protocols, and training interviewers to ensure consistent data collection.
- Minimizing response bias: Designing questions that do not encourage socially desirable answers.

Example of a less precise question (prone to systematic error/imprecision) vs. a more precise one (reducing error by clarity, though this slide is more about precision in response options):
- Less precise: "Do you exercise regularly?" (Highly subjective definition of "regularly").
- More precise: "In a typical week, how many days do you engage in at least 30 minutes of moderate-intensity physical activity?" (Offers a more quantifiable and less ambiguous measure, though still subject to recall bias).

Types of Bias

Building upon systematic error, specific types of bias can significantly distort research findings if not adequately addressed through careful study design and execution.
Selection Bias: Occurs when there are systematic differences between participant characteristics in the study group and the comparison group, or between the study participants and the target population. This can lead to an unrepresentative sample.
- Examples:
  - Non-response bias: Individuals who refuse to participate in a survey may differ significantly from those who do participate (e.g., healthier or sicker individuals may be less likely to respond).
  - Volunteer bias: People who volunteer for a study may have different characteristics than the general population (e.g., more health-conscious).
  - Healthy worker bias: Often seen in occupational studies, where workers are generally healthier than the general population, potentially masking adverse effects of workplace exposures.
Information Bias (Measurement Bias): Arises from systematic differences in the way information is collected for different groups in a study, leading to misclassification of exposures or outcomes.
- Examples:
  - Recall bias: Participants’ memories of past events or exposures may be inaccurate, especially if they have a specific outcome (e.g., a patient with a disease might recall exposures differently than a healthy control).
  - Interviewer bias: Interviewers may unconsciously or consciously influence responses based on their expectations or knowledge of the participant's status (e.g., probing more deeply for certain exposures in diseased participants).
  - Observer bias: Occurs when observers or researchers' expectations or knowledge affect how they record or interpret data.
  - Reporting bias: Participants may selectively report information, particularly sensitive facts (e.g., underreporting stigmatized behaviors or overreporting socially desirable ones).
Confounding: While not strictly a bias in data collection, confounding is a mixing of effects where the effect of an exposure on an outcome is distorted because of an association with another factor (a confounder) that is related to both the exposure and the outcome, but not on the causal pathway. It can mimic or mask true associations.
- Example: The association between coffee drinking and heart disease might be confounded by smoking, as smokers tend to drink more coffee and are also at higher risk for heart disease.

Census vs. Sample Survey

These represent two fundamental approaches to collecting data about a population, each with distinct advantages and use cases in health research.
Census:
- Definition: A complete enumeration of every individual or unit within a specified population.
- Characteristics: Aims to collect data from all members of the target population.
- Advantages: Provides the most accurate data for the entire population without sampling error. Can be used to obtain detailed data for small geographical areas or specific subgroups.
- Disadvantages: Extremely costly, time-consuming, and often logistically impractical for large populations. Difficult to keep up-to-date.
- Application: National population counts, essential for resource allocation, policy planning, and determining electoral districts.
Sample Survey:
- Definition: The collection of data from a subset (sample) of individuals or units drawn from a larger population.
- Characteristics: Gathers data from a representative part of the population, from which inferences about the entire population are made.
- Advantages: More feasible, cost-effective, and quicker to conduct than a census, especially for large populations. Can allow for more detailed data collection from each participant due to reduced scale.
- Disadvantages: Subject to sampling error (the difference between the sample estimate and the true population parameter), which can affect the generalizability of findings. Requires careful sampling methods to ensure representativeness.
- Application: Most health research studies, public opinion polls, market research, and epidemiological investigations to estimate disease prevalence or risk factor distribution.

Types of Sampling

The method chosen for selecting participants directly impacts the generalizability and validity of survey results. Sampling methods are broadly categorized into probability and non-probability approaches.
1. Probability Sampling Methods:
- Every unit in the population has a known, non-zero chance of being selected. This allows for statistical inference about the population and calculation of sampling error.
- a. Simple Random Sampling (SRS):
  - Description: Each individual in the population has an equal chance of being selected. This is often done using random number generators.
  - Advantages: Unbiased, simple to implement for small, well-defined populations with a complete list of members.
  - Disadvantages: Can be impractical for large populations, may not guarantee sufficient representation of subgroups.
- b. Systematic Sampling:
  - Description: Selecting every k^{th} individual from a population list after a random start. The sampling interval k is calculated as Population Size / Sample Size.
  - Advantages: Simpler than SRS, ensures good coverage across the population list.
  - Disadvantages: If there's a pattern or periodicity in the list that aligns with the sampling interval, it can lead to bias.
- c. Stratified Sampling:
  - Description: Dividing the population into homogeneous subgroups (strata) based on shared characteristics (e.g., age groups, gender, geographical regions), then taking a simple random sample from each stratum.
  - Advantages: Ensures representation of key subgroups, can increase precision (reduce sampling error) if strata are homogeneous internally and heterogeneous externally. Useful for comparing subgroups.
  - Disadvantages: Requires knowledge of relevant stratification variables and population sizes within each stratum.
- d. Cluster Sampling:
  - Description: Dividing the population into naturally occurring groups (clusters), such as schools, hospitals, or neighborhoods. A random sample of clusters is selected, and all individuals within the selected clusters are then surveyed (one-stage) or a sample is taken from within the selected clusters (two-stage).
  - Advantages: Efficient for geographically dispersed populations, reduces travel costs and logistical complexity.
  - Disadvantages: Higher sampling error compared to SRS if clusters are not homogeneous. Requires careful definition of clusters.
2. Non-Probability Sampling Methods:
- Selection is not random, and the probability of any unit being selected is unknown. These methods are often easier and less costly but do not allow for statistical generalization to the larger population and are prone to selection bias.
- a. Convenience Sampling:
  - Description: Selecting participants who are readily available, accessible, or volunteer. The easiest to recruit.
  - Advantages: Quick, inexpensive, and convenient.
  - Disadvantages: Highly prone to selection bias; results are rarely representative of the population and cannot be generalized.
- b. Quota Sampling:
  - Description: Similar to stratified sampling, but non-random. The population is divided into subgroups, and a specific number (quota) of participants is recruited from each subgroup until the quota is met, often using convenience or purposive methods.
  - Advantages: Ensures representation of certain characteristics in the sample.
  - Disadvantages: Still subject to selection bias within quotas, as selection is not random.
- c. Purposive (Judgmental) Sampling:
  - Description: The researcher deliberately selects participants based on their expertise, knowledge, or specific characteristics relevant to the research question.
  - Advantages: Useful for qualitative research, identifying key informants, or studying rare populations.
  - Disadvantages: High potential for researcher bias, limits generalizability.
- d. Snowball Sampling:
  - Description: Initial participants are asked to identify and refer other potential participants who meet the study criteria. This method is often used for hard-to-reach or hidden populations.
  - Advantages: Effective for reaching populations that are difficult to access through conventional sampling frames.
  - Disadvantages: Highly prone to selection bias, limited generalizability, and potential for homophily (participants tend to refer others similar to themselves).