Notes on Population, Sample, and Interpreting Proportions

Population and Sample: Key Terms

Population: the entire group of interest for a study (all individuals or units that could be studied).
Sample: a subset selected from the population to observe and analyze.
Parameter: a numeric characteristic of the population (unknown in most cases).
Statistic: a numeric characteristic calculated from the sample (observed data).
Relationship: a statistic is used to estimate or infer the corresponding parameter of the population.
Notation examples:
- Population proportion: p = P(X=1) or more generally, a population parameter such as the mean \mu\ and \sigma^2.
- Sample proportion: \hat{p} = \frac{k}{n} where k is the number of successes in the sample of size n.
- Sample mean: \bar{x} = \frac{1}{n}\sum{i=1}^n xi.

Parameter vs Statistic; core idea

The parameter describes the entire population (often unknown): \text{parameter} = p\, (\text{or } \mu, \sigma, \ldots)
The statistic describes the sample (observed value): \text{statistic} = \hat{p}, \bar{x}, s, \ldots
Inference goal: use statistics to estimate parameters and quantify uncertainty.
Common sampling theme: ensure the sample is representative of the population to justify generalizations.

Population, Sample, and their typical questions

When you read a study, you should identify:
- What is the population of interest? (Who or what could vary in the entire group?)
- What is the sample actually studied? (Who or what was measured?)
- What statistic is reported from the sample? (e.g., proportion, mean, median, etc.)
- What parameter would this statistic estimate in the population?
Language cues to watch for:
- “X% of people…” may refer to a sample proportion unless the population is explicitly defined.
- “Compared to Y, X%…” requires clarity on the group being referred to and whether a sample or population is involved.

Interpreting Proportions: the 9/10 example and its issues

Example statement discussed: "Nine out of 10 people frequently recycle waste." and related lines about the UK.
Key idea: Distinguish between statements about a sample vs statements about the population.
Correct interpretation if based on a sample:
- If a sample of size n yields k successes (e.g., people who frequently recycle), then the sample proportion is \hat{p} = \frac{k}{n} (e.g., \hat{p}=0.90 if k=90 and n=100).
- This does not automatically imply that the entire population has proportion p=0.90 unless the sample is representative and we account for sampling error.
A proper phrasing when referring to a sample: "In this sample, \hat{p}=0.90 of respondents frequently recycle."
If attempting to compare to the UK population, you would need data for the UK population or a clearly defined population corresponding to the UK, with appropriate confidence statements.
Common misinterpretations to avoid:
- Treating a sample proportion as if it were a population proportion without caveats.
- Confusing the unit of analysis (the sample) with the unit of the population (the population).
- Overgeneralizing from a non-representative sample.
The calculation example: if the sample shows k successes out of n trials, then the point estimate is \hat{p}=\frac{k}{n} and a standard error for a proportion is often approximated by SE\approx\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} for large enough n.
Confidence interval for a population proportion (as a typical next step):
- CI = \hat{p} \pm z{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} where z{\alpha/2} is the critical value from the standard normal distribution (e.g., z_{0.025}=1.96 for 95% CI).

The three statements exercise: what’s wrong (as seen in the transcript)

Statement 1 (roughly): "We compared, like, nine out of 10 people."
- Issue: The subject of the claim is unclear; it mixes a comparison with a proportion without specifying the population, sample, or time frame.
- Improvement: Specify the population and whether the statement refers to a sample statistic or to the population proportion. Example reformulation: "In this sample of size n, \hat{p}=0.90 of respondents frequently recycle."
Statement 2 (roughly): "When compared to the UK, we're nine out of 10 people. Nine out of 10 people frequently recycle waste."
- Issue: Grammar and scope; unclear what is meant by "we're nine out of 10 people"; also conflates a UK comparison with a local statement about recycling.
- Improvement: Break into clear pieces:
- Define the comparison group clearly: population or sample from the UK (or another specified population).
- State the statistic for the group of interest: e.g., "In our sample, \hat{p}=0.90 of respondents frequently recycle," and, if possible, compare to a population value from UK data with appropriate caveats.
Statement 3 (roughly): "the subjects are the 3,900"
- Issue: This sounds like mislabeling the population when the number (3,900) is more likely a sample size. The population is not identified; the count 3,900 likely refers to the sample, not the entire population.
- Improvement: Distinguish population vs sample size: If the study has 3,900 participants, report: "The sample consists of 3,900 participants; the population is [defined group]." Then proceed with the statistic and its inference to the population.
Takeaway about the errors:
- Do not define the population by a sample size; specify the population clearly.
- Do not generalize a sample proportion to the population without appropriate sampling methodology and uncertainty quantification.
- Use clear, consistent phrasing when describing comparisons (which group is being described, and whether you refer to a population or a sample).

Corrected and clarified phrasing: examples you can use

Example A: Population vs sample
- Population: all adults in country X.
- Sample: a randomly selected group of 1,000 adults from country X.
- Reported statistic: \hat{p} = 0.62 (58–66% in 95% CI) for frequently engaging in activity Y.
- Proper interpretation: "In this sample of 1,000 adults, 62% reported engaging in activity Y. This provides an estimate of the population proportion p with a confidence interval (e.g., 95% CI)."
Example B: Comparing to another population
- If you want to compare to the UK population proportion, ensure you have data for that population and report both estimates with CIs, or state clearly if the comparison is qualitative.
- Correct phrasing: "The sample proportion in our study is \hat{p}=0.60\pm SE. The UK population estimate is p_{UK}=0.65 (with its own CI). Direct comparison should consider sampling error and context."
Example C: Misinterpreted statement corrected
- Incorrect: "9 out of 10 people in the UK recycle." (unverified universal claim)
- Correct: "In the sample, 9 out of 10 respondents (\hat{p}=0.90)) frequently recycle."; If you want to generalize to the population, you need proper sampling and a reported confidence interval.

Practical guidelines for reading and writing these statements

Always identify:
- Population: who is the target group?
- Sample: who was actually observed?
- Statistic: what numeric value is reported from the sample?
- Parameter: what population value is the target of estimation?
Use consistent language: write clearly whether you are describing a sample statistic or a population parameter.
Report uncertainty: include standard error or confidence intervals when inferring from a sample to a population.
Be careful with comparisons: when using phrases like "compared to X," specify the comparator's population and ensure the comparison is valid given sampling design.

Quick practice prompts

Given a study with a sample size n=500 and 220 report frequent recycling, compute the sample proportion: \hat{p}= \frac{220}{500} = 0.44.
Compute the 95% confidence interval for the population proportion using the normal approximation:
- CI = \hat{p} \pm z_{0.025} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} with \hat{p}=0.44 and n=500$$.
Interpret the result in terms of population inference, noting any assumptions and potential limitations.

Summary: key points to remember

Population vs Sample definitions are essential for correct interpretation.
Parameters describe population; statistics describe samples.
A proportion like "9 out of 10" is a fraction that must be tied to a defined sample or population and must be reported with context and uncertainty.
The exact population size (e.g., 3,900) does not automatically define the population; clarify whether that number is a sample size or population size.
Always phrase results with clear reference to whether you are discussing a sample statistic or a population parameter, and report uncertainty where appropriate.