Notes on Sampling and Generalizability

Population and Sample

Population: All units (people or things) possessing the attributes and characteristics of interest
- Examples: the American electorate, all US married couples, all college students
Sampling frame: Subset of units that have a chance to become part of the sample
- Examples: list of registered voters in a US state, marriage licenses at a local courthouse
Sample: Subset of a population; the people who were contacted and provided data
Theoretical population vs. study population vs. sampling frame vs. sample (visual logic):
- The Theoretical Population → The Study Population → The Sampling Frame → The Sample
- The arrows indicate access and representativeness considerations for generalizing findings

Real-world prompts illustrated by examples

Costco example (sampling/measurement intuition):
- Why free samples? reciprocity and loyalty considerations; potential questions about taste consistency home vs. store
- Differences between Costco and home experiences (e.g., microwave vs. traditional heating, prior eating context, the fact you paid for what you’re eating)

Population and Sample: Example in practice

Binge drinking on college campuses (illustrative sampling constructs):
- Population: All college students in the US
- Sampling frame: UA students obtained from the registrar
- Sample: 500 students chosen at random from the registrar’s list
- Issue: Whether this process yields a sample representative of the target population; evidence for representativeness depends on the sampling design

Important Sampling Concepts

Variable: a set of mutually exclusive attributes at various levels of analysis (e.g., communication apprehension, self-disclosure, attitude change)
Parameter: an aggregate summary value for a population for a given variable
Statistic: an aggregate summary value from a sample for a given variable
Sampling Error: the difference between a statistic and the corresponding population parameter
Confidence Levels and Confidence Intervals: ranges in which a population parameter likely falls based on sample statistics

Sampling Error: illustrative examples

Polls after the 9/17/2015 Republican debates (CNN poll vs others):
- Fiorina’s support gain vs. Trump’s ground loss; questions about who conducted the poll, subjects, sample size, question framing, etc.
- These are estimates with error; averaging polls could be a strategy to mitigate some error

Confidence Intervals for Proportions

Computation depends on level of measurement; for proportions (nominal, two outcomes) use the following framework:
- Let $\,\hat{p} = \frac{x}{N}$ be the sample proportion with $x$ successes out of $N$ observations
- The confidence interval is typically $\hat{p} \pm E$ where $E = z_{\frac{1-\mathrm{CL}}{2}} \sqrt{ \frac{\hat{p}(1-\hat{p})}{N} }$ (for large samples, z-approximation)
- For the common 95% confidence level, $z_{0.975} = 1.96$

Confidence Intervals for Proportions: Example

Example: poll of 100 people, candidate X support = 60%
- Proportion: $\hat{p} = 0.60$
- Binomial exact (from calculator):
- Lower bound = 0.4972, Upper bound = 0.6967
- Normal approximation (with z = 1.96):
- Standard error (approx) = $\text{SEM} = \sqrt{\frac{\hat{p}(1-\hat{p})}{N}} = \sqrt{\frac{0.60\times(1-0.60)}{100}} \approx 0.0490$
- Lower bound = $\hat{p} - z\cdot SEM \approx 0.6040$ (rounded in source: 0.5040)
- Upper bound = $\hat{p} + z\cdot SEM \approx 0.6960$
- Note: In the source, a slightly different numeric presentation is shown depending on method used (binomial exact vs normal approximation)

Confidence Intervals: Means

For interval/ratio level data, compute the mean and then the CI for that mean:
- Given: $N = 100, \bar{X} = 23, s = 8$
- Population mean could lie between approximately 21.43 and 24.57 (from the example in the transcript)
- CI formula depends on whether you use z or t:
- Use $\bar{X} \pm t_{1-\frac{\mathrm{CL}}{2}, \; N-1} \cdot \frac{s}{\sqrt{N}}$ where $t$ is the appropriate t-critical value

Confidence Intervals: t vs z and notes

The CI website note (contrast between estimates) highlights a slight difference when using the t-critical value versus 1.96 (z) for CI calculations
Conceptual takeaway: for smaller samples or unknown population variance, t-critical values are more appropriate than z values

Nonprobability Sampling

Definition: Does not rely on random selection; commonly weaker in terms of generalizability to the population
Use cases: when other sampling techniques fail to produce an adequate or appropriate sample; when researchers need participants with special experiences or abilities

Nonprobability Sampling Techniques

Convenience sample
Volunteer sample
Inclusion/Exclusion sample
Snowball or network sample
Purposive sample
Quota sample

Samples and Populations: practical reflection

Researchers study (take measurements from) the sample to make generalizations about the population
Classroom reflection: consider our class as a sample and assess trust in different potential population estimates, including:
- Left-handedness
- Computer operating system preference (Mac vs. Windows)
- Political preference
- Communication strategies related to dating
- Communication strategies related to dealing with parents and/or grandparents

Generalizability and Representativeness (summary)

Generalizability is the extent to which conclusions from a sample can be extended to its population
A sample is representative to the degree that all units (individuals, advertisements, groups, etc.) had the same chance of being selected
Representativeness can only be assured through random sampling

Probability Sampling Theory (core ideas)

Random selection is key to reducing bias (conscious or unconscious)
Probability sampling enables prediction of population parameters and estimates of error
The standard error indicates the dispersion of sample statistics around the population parameter
The standard error decreases as the sample size increases, making larger samples more precise and less prone to large errors

Probability Sampling Theory Continued

The confidence interval (CI) indicates the range in which the population parameter is estimated to lie
Technically, the CI describes the percentage of randomly drawn samples that would fall within that range
Common practice: use 95% confidence level (i.e., 95% CI)
Corollary: about 5% of such intervals would fail to contain the true population parameter

Types of Probability Sampling

Simple random sampling
Systematic sampling (requires a randomly ordered frame to be truly random)
Stratified random sampling (random sampling within subgroups; e.g., GSS and/or Gallup; oversampling can occur)
Cluster sampling (random sampling within known clusters, e.g., schools)

Sample Size

Definition: number of people/units from whom you need to collect data
Determination: ideally prior to selecting the sample
Practical considerations and statistical considerations: larger samples provide greater power (precision) for estimates
Often, the final sample is smaller than the number invited to participate due to nonresponse or other factors (e.g., Gallup-style response rates)
Rule of thumb: the larger the sample relative to the population, the less error or bias

Key takeaways for exam-ready understanding

Always define Population, Sampling Frame, and Sample clearly to assess generalizability
Distinguish between population parameters and sample statistics; recognize sampling error as the distance between them
Use probability sampling to maximize representativeness; understand the role of standard error and confidence intervals in quantifying uncertainty
Differentiate between interval estimates for means vs. proportions; know when to use z versus t critical values
Be able to interpret and critique nonprobability samples and their limitations
Apply these concepts to real-world examples (e.g., polling, classroom surveys, campus studies) to evaluate credibility and potential biases