Comprehensive Study Notes: Symposium Sampling - Why and How
Symposium Sampling: Why and How of it?
Introduction to Sampling
Definition of a Sample: A sample is a subset of the population, specifically selected to be representative of the larger population. It is an alternative to studying the entire population, which is often not feasible.
Rationale for Sampling: It is practically impossible to study an entire population in any research study. Sampling allows researchers to investigate a problem using a smaller, manageable, and representative subset.
Benefits of a Representative Sample: Using a representative sample helps to reduce:
Costs incurred.
Time taken for the research.
Manpower needed to conduct the study.
Factors for Sample Representativeness: The representativeness of a sample depends on three critical factors:
Sampling methodology.
Sample size.
Response rate.
Importance of Systematic Methods: Sampling methods must be systematic and well-defined to enable the drawing of valid inferences from the sample data.
Classification of Sampling Methods
Sampling methods are broadly classified into two main categories:
Probability Sample
Non-probability Sample
Probability Samples as the Gold Standard: Probability sampling is considered the gold standard in sampling methodology because it ensures the generalisability of study results to the target population.
Core Principle of Probability Sampling: In probability sampling, every individual in the population has an equal chance of being selected for the study.
Probability Sampling Methods
Definition: In probability sampling, the investigator can generalise the findings of the sample to the target population. All methods in this category utilize a random process.
Crucial Role of Sampling Frame: A sampling frame is essential in probability sampling. If the frame is not appropriately drawn from the population of interest, random sampling from it cannot adequately address the research problem. Generalisations can only be made to the actual population defined by the sampling frame.
Types of Probability Sampling:
Simple random sampling
Systematic random sampling
Stratified random sampling
Cluster sampling
Multiphase sampling
Multistage sampling
1. Simple Random Sampling
Description: In this method, every individual in the population has an equal chance of being selected for the sample.
Selection Methods: Data can be chosen using:
Random number tables.
Computer-generated lists of random numbers.
Lottery method.
Using currency notes.
Requirements: A sampling frame is required, where all individuals in the study population must be enumerated in either ascending or descending order.
Advantages:
Minimal knowledge of the population is required.
High internal and external validity.
Easy to analyse data.
Limitations:
High cost.
Requires a complete sampling frame.
Tendency for large sampling errors.
Less precision compared to stratified samples of the same size.[2]
Example: To select 50 participants from a conference of 200: The list of 200 participants serves as the sampling frame. The 50 participants are selected using a random number table or lottery. Once a participant's number is selected, it is typically 'struck off' (sampling without replacement). This process continues until 50 participants are chosen.
2. Systematic Random Sampling
Description: The selection of the first subject is random, and then subsequent subjects are chosen using a periodic process. Every k^{th} item from the sampling frame is selected.
Determining k: The value of k (sampling interval) is calculated by dividing the total number of items in the sampling frame (N) by the desired sample size (n): k = N/n.
Process: An initial starting point is randomly selected, and then every k^{th} number on the list is chosen.
Advantages:
Moderate usage and cost.
High internal and external validity.
Simple to draw and easy to verify.
Disadvantage: Technically, only the first subject is selected by a pure probability method, as subsequent subjects have a predetermined, non-random interval from the initial choice, meaning some subjects might have zero chance of selection.
Example: Given N=200 and n=50, k = 200/50 = 4. A random number between 1 and 4 is selected (e.g., 3). The subjects would then be participant numbers 3, 7, 11, 15, 19, 23, 27, 31, and so on, until 50 subjects are completed.
3. Stratified Random Sampling
Description: The population data is divided into various sub-groups (strata) based on common characteristics such as age, sex, race, income, education, or ethnicity. A random sample is then drawn from within each stratum.
Advantages:
Assures representation of all relevant groups within the population.
Allows estimation of characteristics for each stratum and facilitates comparisons between strata.
Reduces variability compared to systematic sampling.
Limitations:
Requires accurate information about the proportions of each stratum within the population.
Stratified lists can be expensive to prepare.
Example: To study the prevalence of diabetes in an adult population, the population could be stratified by gender, ensuring an equal number of male and female subjects. This allows for sex-wise prevalence data. Alternatively, stratification by place of residence (urban, rural, peri-urban) could yield area-wise prevalence with equal representation from each group.
4. Cluster Sampling
Description: This is a two-step process where the entire population is divided into clusters or groups, often based on geographic areas or districts (e.g., villages, schools, wards, blocks). Clusters are chosen randomly, and all individuals within the selected clusters are included in the sample.
Usage: It is more prevalent in epidemiological research than in clinical research and is particularly practical for large national surveys.
Requirements: Usually requires a larger sample size than other methods.
Usefulness: Highly useful when the population is widely scattered, making it impractical to sample individual elements across the entire population.[3]
Example: For a study measuring knowledge of Human Papilloma Virus (HPV) and cervical cancer among first-year college students in Delhi: all colleges in Delhi can be considered clusters. 20 colleges are selected randomly (e.g., via simple or systematic random sampling). Then, all students within these 20 selected colleges are interviewed, or a random selection of students is made within each chosen cluster.
5. Multiphase Sampling
Description: A complex form of cluster sampling where the population is organized into groups. Groups are randomly selected, and then members are randomly selected within these selected groups (typically an equal number per group). Information is collected in phases: a part from the whole initial sample, and further parts from sub-samples.
Purpose: Aims to increase precision, reduce costs, and minimize non-response.
Benefits: Survey by this method is less costly, less laborious, and more purposeful.
Example: In a tuberculosis survey:
Phase I: Mantoux test is conducted on all cases initially sampled.
Phase II: Chest X-rays are performed on all individuals who test positive in Phase I.
Phase III: Sputum examination is conducted on all individuals who show positive X-ray findings in Phase II.
6. Multistage Sampling
Description: Another complex form of cluster sampling involving two or more levels of units nested within each other. It repeats two basic steps: listing and sampling. At each stage, the clusters typically become smaller, culminating in the sampling of individual subjects (final or ultimate sampling units).
Terminology for Stages:
First stage: Primary Sampling Unit (PSU).
Second stage: Secondary Sampling Unit (SSU).
Third stage: Tertiary Sampling Unit (TSU).
And so on, until the Final or Ultimate Sampling Units are reached.[2]
Advantages:
While not as robust as true random sampling, it helps overcome limitations inherent in pure random sampling.
Extremely useful when a complete list of all population members does not exist or is impractical to obtain.
Reduces costs significantly compared to traditional cluster sampling because it involves multiple stages of randomisation.
Distinction from Multiphase Sampling: In multistage sampling, the sampling units for different stages are distinct (e.g., states $\rightarrow$ districts $\rightarrow$ villages). In contrast, multiphase sampling involves sampling the same sampling unit multiple times for different types of information.
Example: In a national survey, a random number of districts are chosen from all states (PSUs). Subsequently, a random number of talukas and villages are selected within these chosen districts (SSUs/TSUs). In the third stage, houses are selected in the chosen villages (final units), and all selected houses are surveyed.
Non-Probability Sampling Methods
Definition: In non-probability sampling, the probability that any given subject will be selected is unknown, which inherently leads to selection bias in the study.
Limitations: Results from non-probability samples generally cannot be generalised beyond the specific sample studied.
Types of Non-Probability Sampling:
Convenience/purposive sampling
Quota sampling
Snowball sampling
1. Convenience/Purposive Sampling
Description: This is the most commonly used non-probability sampling method. The sample is chosen based on the investigator's convenience, often selecting respondents who are readily available at the right place and time.
Usage: Especially common in clinical research where patients who meet specific inclusion criteria are recruited as they present.
Advantages:
Widely used and simple to implement.
Less expensive than probability methods.
Does not require a comprehensive list of all population elements.
Limitations:
Variability and bias cannot be precisely measured or controlled.
Results cannot be generalised beyond the specific sample.
Example: Patients attending the out-patient department (OPD) of a hospital who meet certain inclusion criteria; school students; members of a specific social organisation.
2. Quota Sampling
Description: A sampling procedure designed to ensure that a certain characteristic of the population is represented in the sample to the exact extent desired by the investigator.
Advantages:
Moderate cost.
Widely used and understood.
No need for a list of all population elements.
Introduces some elements of stratification by ensuring specific group representation.
Limitations: Similar to convenience sampling, as it inherently suffers from selection bias.
Example: In a sample of 100 individuals, an investigator might aim for 40\% men and 60\% women. Recruitment ceases when the 'quota' for men (e.g., 40 men) is filled, regardless of when it occurs.
Distinction between Quota Sampling and Stratified Sampling:
Feature
Stratified Sampling
Quota Sampling
Selection
Subjects selected by simple random sampling within created categories.
Interviewer selects the first available subject who meets inclusion criteria (convenience based).
Call-backs
Used to obtain specific subjects.
Not typically used; selection is immediate.
Sampling Frame
Required.
Not required.
Probability
Uses probability sampling.
Non-probability (convenience-based after quota set).
Error Est.
Permits estimation of sampling error.
Does not permit estimation of sampling error.
3. Snowball Sampling
Description: In this procedure, initial respondents are chosen either by probability or non-probability methods. These initial respondents then provide information to obtain additional respondents, and the process continues.
Advantages:
Low cost.
Useful in specific circumstances, especially for locating rare or hidden populations.
Disadvantages:
Introduces bias because sampling units are not independent (relies on networks).
Projecting data beyond the sample is generally not justified.
Example: In a study involving individuals engaging in high-risk behaviour or substance abuse, an initial participant might name other individuals involved in similar practices. These new individuals are then recruited, and the process continues until the desired sample size is reached.
Conclusion and Key Points
Importance of Methodology: To achieve valid research results, selecting a sound and scientific sampling methodology is paramount.
Preference for Probability Sampling: Ideally, probability sampling methods should be employed to ensure the representativeness of the sample and the generalisability of results to the target population.
Caution for Non-Probability Sampling: If non-probability methods are used, extreme caution must be exercised when interpreting study results due to inherent biases and lack of generalisability.
Summary Key Points:
The choice of sampling method should always align with the specific population of interest for the study.
Reliable results fundamentally depend on careful and thorough planning of the sampling process.
Probability samples are considered the 'gold standard' in sampling methodology due to their statistical properties.
Using probability sampling allows for the generalisation of findings to the entire population defined by the sampling frame.
Conversely, non-probability sampling limits generalisations strictly to the sample studied, and findings cannot be confidently projected beyond it.