Notes on Sampling and Generalizability
Population and Sample
Population: All units (people or things) possessing the attributes and characteristics of interest
Examples: the American electorate, all US married couples, all college students
Sampling frame: Subset of units that have a chance to become part of the sample
Examples: list of registered voters in a US state, marriage licenses at a local courthouse
Sample: Subset of a population; the people who were contacted and provided data
Theoretical population vs. study population vs. sampling frame vs. sample (visual logic):
The Theoretical Population → The Study Population → The Sampling Frame → The Sample
The arrows indicate access and representativeness considerations for generalizing findings
Real-world prompts illustrated by examples
Costco example (sampling/measurement intuition):
Why free samples? reciprocity and loyalty considerations; potential questions about taste consistency home vs. store
Differences between Costco and home experiences (e.g., microwave vs. traditional heating, prior eating context, the fact you paid for what you’re eating)
Population and Sample: Example in practice
Binge drinking on college campuses (illustrative sampling constructs):
Population: All college students in the US
Sampling frame: UA students obtained from the registrar
Sample: 500 students chosen at random from the registrar’s list
Issue: Whether this process yields a sample representative of the target population; evidence for representativeness depends on the sampling design
Important Sampling Concepts
Variable: a set of mutually exclusive attributes at various levels of analysis (e.g., communication apprehension, self-disclosure, attitude change)
Parameter: an aggregate summary value for a population for a given variable
Statistic: an aggregate summary value from a sample for a given variable
Sampling Error: the difference between a statistic and the corresponding population parameter
Confidence Levels and Confidence Intervals: ranges in which a population parameter likely falls based on sample statistics
Sampling Error: illustrative examples
Polls after the 9/17/2015 Republican debates (CNN poll vs others):
Fiorina’s support gain vs. Trump’s ground loss; questions about who conducted the poll, subjects, sample size, question framing, etc.
These are estimates with error; averaging polls could be a strategy to mitigate some error
Confidence Intervals for Proportions
Computation depends on level of measurement; for proportions (nominal, two outcomes) use the following framework:
Let be the sample proportion with successes out of observations
The confidence interval is typically where (for large samples, z-approximation)
For the common 95% confidence level,
Confidence Intervals for Proportions: Example
Example: poll of 100 people, candidate X support = 60%
Proportion:
Binomial exact (from calculator):
Lower bound = 0.4972, Upper bound = 0.6967
Normal approximation (with z = 1.96):
Standard error (approx) =
Lower bound = (rounded in source: 0.5040)
Upper bound =
Note: In the source, a slightly different numeric presentation is shown depending on method used (binomial exact vs normal approximation)
Confidence Intervals: Means
For interval/ratio level data, compute the mean and then the CI for that mean:
Given:
Population mean could lie between approximately 21.43 and 24.57 (from the example in the transcript)
CI formula depends on whether you use z or t:
Use where is the appropriate t-critical value
Confidence Intervals: t vs z and notes
The CI website note (contrast between estimates) highlights a slight difference when using the t-critical value versus 1.96 (z) for CI calculations
Conceptual takeaway: for smaller samples or unknown population variance, t-critical values are more appropriate than z values
Nonprobability Sampling
Definition: Does not rely on random selection; commonly weaker in terms of generalizability to the population
Use cases: when other sampling techniques fail to produce an adequate or appropriate sample; when researchers need participants with special experiences or abilities
Nonprobability Sampling Techniques
Convenience sample
Volunteer sample
Inclusion/Exclusion sample
Snowball or network sample
Purposive sample
Quota sample
Samples and Populations: practical reflection
Researchers study (take measurements from) the sample to make generalizations about the population
Classroom reflection: consider our class as a sample and assess trust in different potential population estimates, including:
Left-handedness
Computer operating system preference (Mac vs. Windows)
Political preference
Communication strategies related to dating
Communication strategies related to dealing with parents and/or grandparents
Generalizability and Representativeness (summary)
Generalizability is the extent to which conclusions from a sample can be extended to its population
A sample is representative to the degree that all units (individuals, advertisements, groups, etc.) had the same chance of being selected
Representativeness can only be assured through random sampling
Probability Sampling Theory (core ideas)
Random selection is key to reducing bias (conscious or unconscious)
Probability sampling enables prediction of population parameters and estimates of error
The standard error indicates the dispersion of sample statistics around the population parameter
The standard error decreases as the sample size increases, making larger samples more precise and less prone to large errors
Probability Sampling Theory Continued
The confidence interval (CI) indicates the range in which the population parameter is estimated to lie
Technically, the CI describes the percentage of randomly drawn samples that would fall within that range
Common practice: use 95% confidence level (i.e., 95% CI)
Corollary: about 5% of such intervals would fail to contain the true population parameter
Types of Probability Sampling
Simple random sampling
Systematic sampling (requires a randomly ordered frame to be truly random)
Stratified random sampling (random sampling within subgroups; e.g., GSS and/or Gallup; oversampling can occur)
Cluster sampling (random sampling within known clusters, e.g., schools)
Sample Size
Definition: number of people/units from whom you need to collect data
Determination: ideally prior to selecting the sample
Practical considerations and statistical considerations: larger samples provide greater power (precision) for estimates
Often, the final sample is smaller than the number invited to participate due to nonresponse or other factors (e.g., Gallup-style response rates)
Rule of thumb: the larger the sample relative to the population, the less error or bias
Key takeaways for exam-ready understanding
Always define Population, Sampling Frame, and Sample clearly to assess generalizability
Distinguish between population parameters and sample statistics; recognize sampling error as the distance between them
Use probability sampling to maximize representativeness; understand the role of standard error and confidence intervals in quantifying uncertainty
Differentiate between interval estimates for means vs. proportions; know when to use z versus t critical values
Be able to interpret and critique nonprobability samples and their limitations
Apply these concepts to real-world examples (e.g., polling, classroom surveys, campus studies) to evaluate credibility and potential biases