Notes on Population, Sample, and Interpreting Proportions
Population and Sample: Key Terms
Population: the entire group of interest for a study (all individuals or units that could be studied).
Sample: a subset selected from the population to observe and analyze.
Parameter: a numeric characteristic of the population (unknown in most cases).
Statistic: a numeric characteristic calculated from the sample (observed data).
Relationship: a statistic is used to estimate or infer the corresponding parameter of the population.
Notation examples:
Population proportion: p = P(X=1) or more generally, a population parameter such as the mean \mu\ and \sigma^2.
Sample proportion: \hat{p} = \frac{k}{n} where k is the number of successes in the sample of size n.
Sample mean: \bar{x} = \frac{1}{n}\sum{i=1}^n xi.
Parameter vs Statistic; core idea
The parameter describes the entire population (often unknown): \text{parameter} = p\, (\text{or } \mu, \sigma, \ldots)
The statistic describes the sample (observed value): \text{statistic} = \hat{p}, \bar{x}, s, \ldots
Inference goal: use statistics to estimate parameters and quantify uncertainty.
Common sampling theme: ensure the sample is representative of the population to justify generalizations.
Population, Sample, and their typical questions
When you read a study, you should identify:
What is the population of interest? (Who or what could vary in the entire group?)
What is the sample actually studied? (Who or what was measured?)
What statistic is reported from the sample? (e.g., proportion, mean, median, etc.)
What parameter would this statistic estimate in the population?
Language cues to watch for:
“X% of people…” may refer to a sample proportion unless the population is explicitly defined.
“Compared to Y, X%…” requires clarity on the group being referred to and whether a sample or population is involved.
Interpreting Proportions: the 9/10 example and its issues
Example statement discussed: "Nine out of 10 people frequently recycle waste." and related lines about the UK.
Key idea: Distinguish between statements about a sample vs statements about the population.
Correct interpretation if based on a sample:
If a sample of size n yields k successes (e.g., people who frequently recycle), then the sample proportion is \hat{p} = \frac{k}{n} (e.g., \hat{p}=0.90 if k=90 and n=100).
This does not automatically imply that the entire population has proportion p=0.90 unless the sample is representative and we account for sampling error.
A proper phrasing when referring to a sample: "In this sample, \hat{p}=0.90 of respondents frequently recycle."
If attempting to compare to the UK population, you would need data for the UK population or a clearly defined population corresponding to the UK, with appropriate confidence statements.
Common misinterpretations to avoid:
Treating a sample proportion as if it were a population proportion without caveats.
Confusing the unit of analysis (the sample) with the unit of the population (the population).
Overgeneralizing from a non-representative sample.
The calculation example: if the sample shows k successes out of n trials, then the point estimate is \hat{p}=\frac{k}{n} and a standard error for a proportion is often approximated by SE\approx\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} for large enough n.
Confidence interval for a population proportion (as a typical next step):
CI = \hat{p} \pm z{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} where z{\alpha/2} is the critical value from the standard normal distribution (e.g., z_{0.025}=1.96 for 95% CI).
The three statements exercise: what’s wrong (as seen in the transcript)
Statement 1 (roughly): "We compared, like, nine out of 10 people."
Issue: The subject of the claim is unclear; it mixes a comparison with a proportion without specifying the population, sample, or time frame.
Improvement: Specify the population and whether the statement refers to a sample statistic or to the population proportion. Example reformulation: "In this sample of size n, \hat{p}=0.90 of respondents frequently recycle."
Statement 2 (roughly): "When compared to the UK, we're nine out of 10 people. Nine out of 10 people frequently recycle waste."
Issue: Grammar and scope; unclear what is meant by "we're nine out of 10 people"; also conflates a UK comparison with a local statement about recycling.
Improvement: Break into clear pieces:
Define the comparison group clearly: population or sample from the UK (or another specified population).
State the statistic for the group of interest: e.g., "In our sample, \hat{p}=0.90 of respondents frequently recycle," and, if possible, compare to a population value from UK data with appropriate caveats.
Statement 3 (roughly): "the subjects are the 3,900"
Issue: This sounds like mislabeling the population when the number (3,900) is more likely a sample size. The population is not identified; the count 3,900 likely refers to the sample, not the entire population.
Improvement: Distinguish population vs sample size: If the study has 3,900 participants, report: "The sample consists of 3,900 participants; the population is [defined group]." Then proceed with the statistic and its inference to the population.
Takeaway about the errors:
Do not define the population by a sample size; specify the population clearly.
Do not generalize a sample proportion to the population without appropriate sampling methodology and uncertainty quantification.
Use clear, consistent phrasing when describing comparisons (which group is being described, and whether you refer to a population or a sample).
Corrected and clarified phrasing: examples you can use
Example A: Population vs sample
Population: all adults in country X.
Sample: a randomly selected group of 1,000 adults from country X.
Reported statistic: \hat{p} = 0.62 (58–66% in 95% CI) for frequently engaging in activity Y.
Proper interpretation: "In this sample of 1,000 adults, 62% reported engaging in activity Y. This provides an estimate of the population proportion p with a confidence interval (e.g., 95% CI)."
Example B: Comparing to another population
If you want to compare to the UK population proportion, ensure you have data for that population and report both estimates with CIs, or state clearly if the comparison is qualitative.
Correct phrasing: "The sample proportion in our study is \hat{p}=0.60\pm SE. The UK population estimate is p_{UK}=0.65 (with its own CI). Direct comparison should consider sampling error and context."
Example C: Misinterpreted statement corrected
Incorrect: "9 out of 10 people in the UK recycle." (unverified universal claim)
Correct: "In the sample, 9 out of 10 respondents (\hat{p}=0.90)) frequently recycle."; If you want to generalize to the population, you need proper sampling and a reported confidence interval.
Practical guidelines for reading and writing these statements
Always identify:
Population: who is the target group?
Sample: who was actually observed?
Statistic: what numeric value is reported from the sample?
Parameter: what population value is the target of estimation?
Use consistent language: write clearly whether you are describing a sample statistic or a population parameter.
Report uncertainty: include standard error or confidence intervals when inferring from a sample to a population.
Be careful with comparisons: when using phrases like "compared to X," specify the comparator's population and ensure the comparison is valid given sampling design.
Quick practice prompts
Given a study with a sample size n=500 and 220 report frequent recycling, compute the sample proportion: \hat{p}= \frac{220}{500} = 0.44.
Compute the 95% confidence interval for the population proportion using the normal approximation:
CI = \hat{p} \pm z_{0.025} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} with \hat{p}=0.44 and n=500$$.
Interpret the result in terms of population inference, noting any assumptions and potential limitations.
Summary: key points to remember
Population vs Sample definitions are essential for correct interpretation.
Parameters describe population; statistics describe samples.
A proportion like "9 out of 10" is a fraction that must be tied to a defined sample or population and must be reported with context and uncertainty.
The exact population size (e.g., 3,900) does not automatically define the population; clarify whether that number is a sample size or population size.
Always phrase results with clear reference to whether you are discussing a sample statistic or a population parameter, and report uncertainty where appropriate.