Sampling and External Validity in Research

Foundational Concepts in Validity

Construct Validity: Concerns the variables in a study and how well they are measured or manipulated.
Internal Validity: Concerns the design of the study and the extent to which it can support causal claims by eliminating alternative explanations.
External Validity: Concerns the relationship between the sample used in the study and the population or real-world settings it represents.
Statistical Validity: Concerns the data being analyzed and the extent to which the conclusions derived are accurate, reasonable, and supported by the data.

Forms of External Validity

Population Validity (Representative Samples): Refers to the extent to which the results of a study can be generalized from the specific sample to the larger population of interest.
- Representative Sample: A sample that accurately reflects the important characteristics of the population of interest.
- Example: A study investigating voting behavior that recruits exclusively university students would have low population validity compared to the general voting population.
Ecological Validity (Real-World Contexts): Concerns the extent to which study findings from a research setting can be successfully applied to real-world environments.
- Example: Evaluating a new teaching method within a highly controlled laboratory setting may indicate low ecological validity, as it does not account for the variables present in an actual classroom.
Experimental Realism (Engaging Studies): The degree to which an experimental situation is personally involving to the participants, eliciting genuine and spontaneous behavior rather than artificial reactions.
- Example: A simulated courtroom study where mock jurors feel a genuine emotional and cognitive investment in reaching a final verdict is said to have high experimental realism.
Replication (Repeat Studies with New Populations): The process of repeating a study to determine if the results are consistent.
- Direct Replication: Repeating the original study procedures as exactly as possible to see if the same results are obtained.
- Conceptual Replication: Testing the same hypothesis or theoretical association as the original study but using different methods, variables, or populations.
- Example: Investigating the effect of sleep on memory through multiple studies with varying population demographics.

The Sampling Process

Population of Interest: The entire group of individuals about whom the researcher wants to draw conclusions.
Sampling Frame: A list or set of procedures used to identify the actual population from which the researcher will draw the participants.
Sample: The specific group of individuals who participate in the research.
Two Broad Approaches to Sampling:
- Probability Sampling: The researcher employs a random or unbiased method to acquire participants, ensuring every member of the population has a known chance of selection.
- Non-probability Sampling: Participants are selected based on subjective judgment or convenience rather than random selection.

Strategic Approaches to External Validity

Generalization Mode: Used when the specific goal of the research is to explicitly examine if findings generalize to a specific population or context. In this mode, external validity and probability sampling are of paramount importance.
Theory-Testing Mode: Used when the goal is to explicitly test theoretical associations or causal relationships. In this mode, external validity may be prioritized less than internal validity, and probability sampling may not be required.

Probability Sampling Methods

General Characteristics: This approach is the most representative of the population but is frequently time-consuming and cost-intensive.
Simple Random Sampling:
- Every member of the population has an equal chance of being selected.
- Requires acquiring a complete list of everyone in the population and using a random process (e.g., a random number generator) to select the sample.
- It is unbiased and simple in concept but often impractical for large or dispersed populations.
Cluster Sampling:
- The population is divided into separate groups called "clusters."
- A random sample of these clusters is selected, and all individuals within those clusters are studied.
- Examples: Selecting specific schools within a city or households within a specific neighborhood.
- Note: It is more efficient than simple random sampling but potentially less representative if clusters are homogenous.
Multi-Stage Cluster Sampling:
- A tiered approach where researchers randomly select larger clusters, then randomly select smaller units within those clusters.
- Process Example:
  - Stage 1: Randomly select Campuses.
  - Stage 2: Randomly select Divisions within those campuses.
  - Stage 3: Randomly select Departments within those divisions.
  - Stage 4: Randomly select specific classes within the departments using a random number generator.
  - Stage 5: Randomly sample participants from within the selected classrooms.
Stratified Sampling:
- Participants are recruited based on specific demographic characteristics, known as "strata."
- Proportionate Stratified Sampling: Researchers randomly sample from within each demographic stratum to ensure the sample proportions match the population proportions.
- Disproportionate Stratified Sampling: Researchers "oversample" from specific strata to ensure equal representation of groups that might otherwise be too small to analyze effectively.
- Note: This is precise but complex and time-consuming.
Cluster vs. Stratified Sampling Distinction:
- Cluster Sampling: Acquiring participants based on common locations.
- Stratified Sampling: Recruiting participants because they meet specific personal characteristics.
Combined Methods: Researchers may use a mix of strategies, such as dividing a population into strata, then identifying clusters within those strata, and finally randomly sampling from those clusters.

Non-probability Sampling Methods

General Characteristics: These methods are time and cost-effective but are not as representative as probability sampling.
Convenience Sampling: The researcher samples individuals who are easily available or who volunteer. Common examples include undergraduate students in a psychology pool, online experiment volunteers, or respondents to public advertisements.
Purposive Sampling: Participants are selected based on specific characteristics or criteria that align directly with the objectives of the study. While it ensures specific groups are included, it carries a high potential for researcher bias.
Snowball Sampling: Existing participants are asked to recruit future participants from among their acquaintances. This is common in developmental psychology and when studying hard-to-reach subgroups, though it carries selection bias.
Quota Sampling: Researchers recruit volunteers from various subsets of the population until specific targets (quotas) are met. This is essentially the non-probability equivalent of stratified sampling.

Threats to External Validity

Sampling Bias: Occurs when the sample selected is not representative of the population of interest.
Non-response Bias: A specific type of sampling bias in which the people who choose to respond to a survey or participate in a study differ systematically from those who do not respond.
Situational Factors: Occurs when specific factors unique to the study's setting affect the results, making them inapplicable to other settings.
Temporal Validity: Concerns the extent to which the study's results can be generalized to other time periods.

Case Study: Kelly et al. (2018) on Social Media and Depression

Study Objective: To assess whether social media use is associated with depressive symptoms in adolescents.
Population Data: Utilized the UK Millennium Cohort Study involving $10,904$ 14-year-olds.
Sampling Details:
- Families receiving Child Benefit ( $98\%$ of the UK population at the time) were selected from a random sample of electoral wards across England, Scotland, Wales, and Northern Ireland. This component of the design constitutes Cluster Sampling.
- Certain subgroups were intentionally over-sampled, including those in disadvantaged circumstances, children from minority ethnic backgrounds in England, and youth in Scotland, Wales, and Northern Ireland. This over-sampling represents Stratified Sampling.
Analysis of Validity: Since only $61\%$ of the original cohort participated in the age 14 interview, a primary threat to the population validity of this study is Non-response Bias.

Sampling vs. Assignment

Random Sampling:
- Used to acquire a sample from a population.
- Enhances External Validity by ensuring the sample represents the population.
Random Assignment:
- Used to assign participants to different groups (e.g., Treatment group vs. Control group) within a study.
- Enhances Internal Validity by ensuring that comparison groups are equivalent at the start of the study, thereby helping to isolate the effect of the independent variable.

Describing Data: Distributions and Central Tendency

Frequency Table/Histogram: Tools used to organize and summarize data. A histogram provides a visual overview of variance, symmetry, and central tendency.
Measures of Central Tendency:
- Mode: The most frequently occurring score in a distribution. It is meaningful only when there is a clear "peak."
- Median: The literal center point of the data when arranged in order. It is the most stable measure but is not sensitive to the value of every score.
- Mean ( $M$ ): The mathematical average of all scores. It is the most sensitive measure but can be significantly influenced by extreme scores (outliers).
Types of Distribution Symmetry:
- Normal Distribution: Symmetrical distribution with one peak in the middle. In a perfect normal distribution, $\text{Mean} = \text{Median} = \text{Mode}$ .
- Skewed Distribution: Asymmetrical distribution. The mean is "pulled" in the direction of the skew (the tail of the distribution).
  - If \text{Mean} > \text{Median}, it is a positive skew.
  - If \text{Mean} < \text{Median}, it is a negative skew.

Measuring Variance

Range: Calculated as $\text{Maximum Score} - \text{Minimum Score}$ .
Standard Deviation ( $S$ ): A statistic that describes the average amount that each individual score deviates from the mean of the distribution.

Introduction to Inferential Statistics

Descriptive Statistics: Help organize and summarize data collected from a sample.
Inferential Statistics: Use sample data to make generalizations (inferences) about populations.
Point Estimate: The specific estimate of an effect or value calculated from the sample (e.g., a sample mean).
Population Parameter: The "true" value of the variable in the entire population that the researcher is trying to estimate.
Key Principles:
- Samples differ from one another; while most point estimates will be close to the "true" population parameter, some will be significantly different.
- Data from samples provide only a rough point estimate of the true population parameter.

Margin of Error and Confidence Intervals

Margin of Error (MoE): The range of values above and below a point estimate in which the true population parameter is likely to fall.
Confidence Interval (CI): Calculated as $\text{Point Estimate} \pm \text{Margin of Error}$ .
Sample Size Impact: The larger the sample size ( $N$ ), the smaller the margin of error.
- Example 1 ( $N = 1000$ ): $\text{CI} = 59\% \pm 3\% \rightarrow [56\%, 62\%]$
- Example 2 ( $N = 10$ ): $\text{CI} = 59\% \pm 30\% \rightarrow [29\%, 89\%]$
Confidence Levels:
- 95% Confidence Level: The margin of error is calculated so that in $95\%$ of all possible samples, the calculated confidence interval will contain the true population parameter.
- 99% Confidence Level: Reduces the risk of error to less than $1\%$ by widening the confidence interval.
The Trade-off:
- To achieve greater confidence, a researcher must widen the margin of error (reducing precision).
- To achieve greater precision (a narrower interval), a researcher must accept a higher risk of error.

Case Study: Mehl et al. (2007) - Word Counts by Gender

Research Question: Do women talk more than men?
Method: Used an unobtrusive electronic recording device (EAR) to sample conversations from university students across six North American universities.
Sample Data:
- Women: $N = 210$ , $M = 16,215$ , $SD = 7,301$
- Men: $N = 186$ , $M = 15,669$ , $SD = 8,633$
Initial Point Estimate: Women spoke, on average, $546$ words more per day than men ( $16,215 - 15,669 = 546$ ).
Application of Margin of Error (95\% MoE):
- Women: $16,215 \pm 987$ words.
- Men: $15,669 \pm 1,240$ words.
Inference Results: When accounting for the significant overlap in the confidence intervals produced by the margins of error, the difference of $546$ words is not considered statistically significant to conclude that women talk more than men in the broader population.