Notes on Sampling, Validity, and Threats to Internal/External Validity

Populations and Samples

Target population (the population): the entire set of individuals who have the characteristics required by the researcher.
Accessible population: a portion of the target population consisting of individuals who are accessible to be recruited as participants in the study.
Sample: the individuals who are selected to participate in the research study.
Key idea: research starts from a broad population concept and moves to a study sample that is then used to infer properties about the population.
Relationship to study results: results from the sample are generalized to the population; the sample is the actual group studied, drawn from the population.

The Relationship between a Population and a Sample

Process: Research begins with a general question about a population (the population is all individuals of interest).
The sample is the actual subset used in the research study.
The results from the sample are generalized to the population.
The sample is selected from the population; the actual research study is conducted with the sample.

Selection vs. Assignment

From population to sample: selection of a sample is also called sampling.
From sample to conditions: assignment of the sampled individuals to study conditions (e.g., treatment groups).
Key distinction: sampling concerns who is in the study; assignment concerns what treatment or condition each participant receives.

Population vs. Sample (Examples)

Population:
- All people in the U.S. with depression
- All college students
- All students in this classroom
Accessible population (examples from the slide concepts):
- Patients with depression at a downtown clinic
- Students passing by the hallway at LMU
Sample (examples):
- 3 students in the classroom
Representativeness: the extent to which a selected sample accurately reflects the population of interest. It ranges from unbiased (representative) to biased.

Representativeness

Representativeness: extent to which the selected sample accurately reflects the population of interest.
It ranges from unbiased (representative) to biased (unrepresentative).

Probability Sampling (Have list of all population members)

Simple Random Sampling
- From the population list, each person has an equal chance of being selected.
- Steps (example):
  1) define population of interest (all subscribers to a magazine) $ext{Population} = ext{all subscribers}$
  2) list all people in the population (e.g., names from the magazine’s mailing list) $ext{List} = ext{subscriber names}$
  3) randomly select people (e.g., randomly select $100$ subscribers on the list) $n=100$
Systematic Sampling
- From population list, select every nth person.
- Steps (example):
  1) define population of interest (LMU freshmen) $ext{Population} = ext{LMU freshmen}$
  2) list all people in population (registrar's list) $ext{List} = ext{freshman names}$
  3) select every nth person (e.g., every 10th freshman) $k = 10$

Nonprobability Sampling (Do not have list of all population members)

Convenience Sampling
- Selecting people who are easy to get.
- Often the most common sampling method.
- Examples: hallway samples, subject pools, friends/family, local preschool, newspaper ads.
Quota Sampling
- Form of convenience sampling to ensure subgroups are adequately represented.
- Continue selecting until quota for every subgroup is met.
Example (illustrative): US population (N = $300{,}000{,}000$ )
- Disorder: 10% vs No disorder: 90%
- Sample (n = 100): Disorder: $n=7$ (continue until $n=10$ ); No disorder: $n=90$ (stop)

Stratified Random Sample (SRS)

Process: The population is divided into subgroups (strata); equal numbers are randomly selected from each of the subgroups.
Purpose: Guarantees that each subgroup will have adequate representation.
Limitation: Overall sample is usually not representative of the population.

Stratified Random Sampling (2 of 2)

Illustration categories:
- Families with Incomes Above $250{,}000$
- Families with Incomes Above $100{,}000$ but Below $250{,}000$
- Families with Incomes Above $40{,}000$ but Below $100{,}000$
- Families with Incomes Below $40{,}000$

Stratified Random Sample Example

Example setup: Population with 400 Type A people and 100 Type B people.
Sampling: stratified sample of $n = 20$ and proportionate stratified sample of $n = 20$ .
Note: Proportionate stratified sampling would allocate samples according to population proportions (Type A: 80%, Type B: 20%), but the slide states the concept; exact allocation for this example is not explicitly specified beyond the two samples of $n = 20$ .

Summary of Sampling Methods: Probability Sampling

Simple random
- Description: A sample is obtained using a random process from a list containing the total population.
- Strengths/Weaknesses: Ensures each individual has an equal and independent chance; fair and unbiased; however, there is no guarantee the sample is representative.
Systematic
- Description: A sample is obtained by selecting every $n^{th}$ participant from a list after a random start.
- Strengths/Weaknesses: Easy method for obtaining an essentially random sample; selections are not really random or independent.
Stratified random
- Description: Population divided into subgroups (strata) and then randomly selecting equal numbers from each subgroup.
- Strengths/Weaknesses: Guarantees each subgroup has adequate representation; overall sample is usually not representative of the population.
Proportionate stratified
- Description: Subdividing population into strata and then randomly selecting from each stratum so that the proportions in the sample match the population proportions.
- Strengths/Weaknesses: Guarantees composition of the sample matches the population; but some strata may have limited representation in the sample.
Cluster
- Description: Sample by randomly selecting clusters (preexisting groups) from a list of all clusters in the population.
- Strengths/Weaknesses: Easy method for obtaining a large, relatively random sample; however, selections are not really random or independent.

Summary of Sampling Methods: Nonprobability Sampling

Convenience
- Description: A sample obtained by selecting participants who are easy to get.
- Strengths/Weaknesses: Easy method; the sample is probably biased.
Quota
- Description: Identify subgroups to be included, then establish quotas for individuals to be selected through convenience from each subgroup.
- Strengths/Weaknesses: Allows control of the sample composition; but the sample is probably biased.

Discussion

What challenges can you see with probability sampling?
If you can’t use a probability sampling, does random assignment still help?

3 Families of Validity Measurement

Validity (Measurement)
- Question: To what extent does the measure assess what it claims to measure? (e.g., construct validity)
External Validity (Generalizability)
- Question: To what extent could results be generalized to other people, methods, etc.?
- Qualifier: Low = not generalizable; High = generalizable
Internal Validity (Causality)
- Question: To what extent is the IV responsible for causing the DV?
- Indicators: Low = confound; High = no confound; IV causes DV.
- Notation: IV
  ightarrow DV; possibility of ± confounds.

Threats to External Validity

Any characteristic that limits generalizability of results.
Two threats:
- Difficulty in generalizing across participants
- Example: results limited to college students, volunteers, not cross-species; results not applicable to other participants.
- Difficulty in generalizing across features of a study
- Example: results limited to the particular experimenter, method, or measure used; results not applicable to other procedural variations.

Threats to Internal Validity

Defined as confounds rather than the independent variable (IV) causing the dependent variable (DV).
The seven confounds are summarized with MR SMITH:
- Mortality: Participant attrition
- Regression toward the mean: Extreme scores on first testing tend to be less extreme on subsequent testing
- Selection: Assignment bias; participants in different conditions are different at onset; e.g., self-selection into conditions
- Maturation: Physical or psychological changes in participants over time
- Instrumentation: Problems with measurement instruments (e.g., a faulty heart rate monitor in one condition)
- Testing: Practice effects or familiarity with measures due to repeated testing
- History: External events affecting conditions differently (e.g., earthquake, noise, political changes)

Threats to Internal Validity (MR SMITH)

Mortality
Regression to the Means
Selection
Maturation
Instrumentation
Testing
History

Challenges with probability sampling often stem from the practicalities of research. For instance, obtaining a complete list of all population members – a prerequisite for true probability sampling – can be extremely difficult or even impossible in many real-world scenarios. Even when a list is available, specific methods can have their own issues:

Simple Random Sampling: While fair and unbiased, it doesn't guarantee the sample will be truly representative of the population.
Systematic Sampling: Selections are not truly random or independent once the starting point is chosen.
Stratified Random Sampling: While guaranteeing representation for subgroups, the overall sample might not be representative of the population's true proportions.
Proportionate Stratified Sampling: Despite matching population proportions, some smaller strata might have very limited representation in the sample.
Cluster Sampling: Selections within clusters are not truly random or independent, and the pre-existing groups might not perfectly reflect the larger population.

Regarding random assignment, even if you cannot use probability sampling to select your sample (meaning your sample might not be representative of the larger population, impacting external validity), random assignment still helps. Random assignment is about distributing the participants you do have (your sample) into different treatment or control conditions. Its primary role is to reduce the impact of confounding variables by making the groups equivalent at the outset of the study. This significantly increases the study's internal validity, which is the extent to which you can confidently conclude that the independent variable (IV) caused changes in the dependent variable (DV), rather than some other factor (confound). So, while probability sampling affects generalizability, random assignment protects against bias within the study's groups.