Notes on Sampling and Generalizability

Population and Sample

  • Population: All units (people or things) possessing the attributes and characteristics of interest

    • Examples: the American electorate, all US married couples, all college students

  • Sampling frame: Subset of units that have a chance to become part of the sample

    • Examples: list of registered voters in a US state, marriage licenses at a local courthouse

  • Sample: Subset of a population; the people who were contacted and provided data

  • Theoretical population vs. study population vs. sampling frame vs. sample (visual logic):

    • The Theoretical Population → The Study Population → The Sampling Frame → The Sample

    • The arrows indicate access and representativeness considerations for generalizing findings

Real-world prompts illustrated by examples

  • Costco example (sampling/measurement intuition):

    • Why free samples? reciprocity and loyalty considerations; potential questions about taste consistency home vs. store

    • Differences between Costco and home experiences (e.g., microwave vs. traditional heating, prior eating context, the fact you paid for what you’re eating)

Population and Sample: Example in practice

  • Binge drinking on college campuses (illustrative sampling constructs):

    • Population: All college students in the US

    • Sampling frame: UA students obtained from the registrar

    • Sample: 500 students chosen at random from the registrar’s list

    • Issue: Whether this process yields a sample representative of the target population; evidence for representativeness depends on the sampling design

Important Sampling Concepts

  • Variable: a set of mutually exclusive attributes at various levels of analysis (e.g., communication apprehension, self-disclosure, attitude change)

  • Parameter: an aggregate summary value for a population for a given variable

  • Statistic: an aggregate summary value from a sample for a given variable

  • Sampling Error: the difference between a statistic and the corresponding population parameter

  • Confidence Levels and Confidence Intervals: ranges in which a population parameter likely falls based on sample statistics

Sampling Error: illustrative examples

  • Polls after the 9/17/2015 Republican debates (CNN poll vs others):

    • Fiorina’s support gain vs. Trump’s ground loss; questions about who conducted the poll, subjects, sample size, question framing, etc.

    • These are estimates with error; averaging polls could be a strategy to mitigate some error

Confidence Intervals for Proportions

  • Computation depends on level of measurement; for proportions (nominal, two outcomes) use the following framework:

    • Let p^=xN\,\hat{p} = \frac{x}{N} be the sample proportion with xx successes out of NN observations

    • The confidence interval is typically p^±E\hat{p} \pm E where E=z1CL2p^(1p^)NE = z_{\frac{1-\mathrm{CL}}{2}} \sqrt{ \frac{\hat{p}(1-\hat{p})}{N} } (for large samples, z-approximation)

    • For the common 95% confidence level, z0.975=1.96z_{0.975} = 1.96

Confidence Intervals for Proportions: Example

  • Example: poll of 100 people, candidate X support = 60%

    • Proportion: p^=0.60\hat{p} = 0.60

    • Binomial exact (from calculator):

    • Lower bound = 0.4972, Upper bound = 0.6967

    • Normal approximation (with z = 1.96):

    • Standard error (approx) = SEM=p^(1p^)N=0.60×(10.60)1000.0490\text{SEM} = \sqrt{\frac{\hat{p}(1-\hat{p})}{N}} = \sqrt{\frac{0.60\times(1-0.60)}{100}} \approx 0.0490

    • Lower bound = p^zSEM0.6040\hat{p} - z\cdot SEM \approx 0.6040 (rounded in source: 0.5040)

    • Upper bound = p^+zSEM0.6960\hat{p} + z\cdot SEM \approx 0.6960

    • Note: In the source, a slightly different numeric presentation is shown depending on method used (binomial exact vs normal approximation)

Confidence Intervals: Means

  • For interval/ratio level data, compute the mean and then the CI for that mean:

    • Given: N=100,Xˉ=23,s=8N = 100, \bar{X} = 23, s = 8

    • Population mean could lie between approximately 21.43 and 24.57 (from the example in the transcript)

    • CI formula depends on whether you use z or t:

    • Use Xˉ±t1CL2,  N1sN\bar{X} \pm t_{1-\frac{\mathrm{CL}}{2}, \; N-1} \cdot \frac{s}{\sqrt{N}} where tt is the appropriate t-critical value

Confidence Intervals: t vs z and notes

  • The CI website note (contrast between estimates) highlights a slight difference when using the t-critical value versus 1.96 (z) for CI calculations

  • Conceptual takeaway: for smaller samples or unknown population variance, t-critical values are more appropriate than z values

Nonprobability Sampling

  • Definition: Does not rely on random selection; commonly weaker in terms of generalizability to the population

  • Use cases: when other sampling techniques fail to produce an adequate or appropriate sample; when researchers need participants with special experiences or abilities

Nonprobability Sampling Techniques

  • Convenience sample

  • Volunteer sample

  • Inclusion/Exclusion sample

  • Snowball or network sample

  • Purposive sample

  • Quota sample

Samples and Populations: practical reflection

  • Researchers study (take measurements from) the sample to make generalizations about the population

  • Classroom reflection: consider our class as a sample and assess trust in different potential population estimates, including:

    • Left-handedness

    • Computer operating system preference (Mac vs. Windows)

    • Political preference

    • Communication strategies related to dating

    • Communication strategies related to dealing with parents and/or grandparents

Generalizability and Representativeness (summary)

  • Generalizability is the extent to which conclusions from a sample can be extended to its population

  • A sample is representative to the degree that all units (individuals, advertisements, groups, etc.) had the same chance of being selected

  • Representativeness can only be assured through random sampling

Probability Sampling Theory (core ideas)

  • Random selection is key to reducing bias (conscious or unconscious)

  • Probability sampling enables prediction of population parameters and estimates of error

  • The standard error indicates the dispersion of sample statistics around the population parameter

  • The standard error decreases as the sample size increases, making larger samples more precise and less prone to large errors

Probability Sampling Theory Continued

  • The confidence interval (CI) indicates the range in which the population parameter is estimated to lie

  • Technically, the CI describes the percentage of randomly drawn samples that would fall within that range

  • Common practice: use 95% confidence level (i.e., 95% CI)

  • Corollary: about 5% of such intervals would fail to contain the true population parameter

Types of Probability Sampling

  • Simple random sampling

  • Systematic sampling (requires a randomly ordered frame to be truly random)

  • Stratified random sampling (random sampling within subgroups; e.g., GSS and/or Gallup; oversampling can occur)

  • Cluster sampling (random sampling within known clusters, e.g., schools)

Sample Size

  • Definition: number of people/units from whom you need to collect data

  • Determination: ideally prior to selecting the sample

  • Practical considerations and statistical considerations: larger samples provide greater power (precision) for estimates

  • Often, the final sample is smaller than the number invited to participate due to nonresponse or other factors (e.g., Gallup-style response rates)

  • Rule of thumb: the larger the sample relative to the population, the less error or bias

Key takeaways for exam-ready understanding

  • Always define Population, Sampling Frame, and Sample clearly to assess generalizability

  • Distinguish between population parameters and sample statistics; recognize sampling error as the distance between them

  • Use probability sampling to maximize representativeness; understand the role of standard error and confidence intervals in quantifying uncertainty

  • Differentiate between interval estimates for means vs. proportions; know when to use z versus t critical values

  • Be able to interpret and critique nonprobability samples and their limitations

  • Apply these concepts to real-world examples (e.g., polling, classroom surveys, campus studies) to evaluate credibility and potential biases