Notes on Population, Sample, and Interpreting Proportions

Population and Sample: Key Terms

  • Population: the entire group of interest for a study (all individuals or units that could be studied).

  • Sample: a subset selected from the population to observe and analyze.

  • Parameter: a numeric characteristic of the population (unknown in most cases).

  • Statistic: a numeric characteristic calculated from the sample (observed data).

  • Relationship: a statistic is used to estimate or infer the corresponding parameter of the population.

  • Notation examples:

    • Population proportion: p = P(X=1) or more generally, a population parameter such as the mean \mu\ and \sigma^2.

    • Sample proportion: \hat{p} = \frac{k}{n} where k is the number of successes in the sample of size n.

    • Sample mean: \bar{x} = \frac{1}{n}\sum{i=1}^n xi.

Parameter vs Statistic; core idea

  • The parameter describes the entire population (often unknown): \text{parameter} = p\, (\text{or } \mu, \sigma, \ldots)

  • The statistic describes the sample (observed value): \text{statistic} = \hat{p}, \bar{x}, s, \ldots

  • Inference goal: use statistics to estimate parameters and quantify uncertainty.

  • Common sampling theme: ensure the sample is representative of the population to justify generalizations.

Population, Sample, and their typical questions

  • When you read a study, you should identify:

    • What is the population of interest? (Who or what could vary in the entire group?)

    • What is the sample actually studied? (Who or what was measured?)

    • What statistic is reported from the sample? (e.g., proportion, mean, median, etc.)

    • What parameter would this statistic estimate in the population?

  • Language cues to watch for:

    • “X% of people…” may refer to a sample proportion unless the population is explicitly defined.

    • “Compared to Y, X%…” requires clarity on the group being referred to and whether a sample or population is involved.

Interpreting Proportions: the 9/10 example and its issues

  • Example statement discussed: "Nine out of 10 people frequently recycle waste." and related lines about the UK.

  • Key idea: Distinguish between statements about a sample vs statements about the population.

  • Correct interpretation if based on a sample:

    • If a sample of size n yields k successes (e.g., people who frequently recycle), then the sample proportion is \hat{p} = \frac{k}{n} (e.g., \hat{p}=0.90 if k=90 and n=100).

    • This does not automatically imply that the entire population has proportion p=0.90 unless the sample is representative and we account for sampling error.

  • A proper phrasing when referring to a sample: "In this sample, \hat{p}=0.90 of respondents frequently recycle."

  • If attempting to compare to the UK population, you would need data for the UK population or a clearly defined population corresponding to the UK, with appropriate confidence statements.

  • Common misinterpretations to avoid:

    • Treating a sample proportion as if it were a population proportion without caveats.

    • Confusing the unit of analysis (the sample) with the unit of the population (the population).

    • Overgeneralizing from a non-representative sample.

  • The calculation example: if the sample shows k successes out of n trials, then the point estimate is \hat{p}=\frac{k}{n} and a standard error for a proportion is often approximated by SE\approx\sqrt{\frac{\hat{p}(1-\hat{p})}{n}} for large enough n.

  • Confidence interval for a population proportion (as a typical next step):

    • CI = \hat{p} \pm z{\alpha/2} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} where z{\alpha/2} is the critical value from the standard normal distribution (e.g., z_{0.025}=1.96 for 95% CI).

The three statements exercise: what’s wrong (as seen in the transcript)

  • Statement 1 (roughly): "We compared, like, nine out of 10 people."

    • Issue: The subject of the claim is unclear; it mixes a comparison with a proportion without specifying the population, sample, or time frame.

    • Improvement: Specify the population and whether the statement refers to a sample statistic or to the population proportion. Example reformulation: "In this sample of size n, \hat{p}=0.90 of respondents frequently recycle."

  • Statement 2 (roughly): "When compared to the UK, we're nine out of 10 people. Nine out of 10 people frequently recycle waste."

    • Issue: Grammar and scope; unclear what is meant by "we're nine out of 10 people"; also conflates a UK comparison with a local statement about recycling.

    • Improvement: Break into clear pieces:

    • Define the comparison group clearly: population or sample from the UK (or another specified population).

    • State the statistic for the group of interest: e.g., "In our sample, \hat{p}=0.90 of respondents frequently recycle," and, if possible, compare to a population value from UK data with appropriate caveats.

  • Statement 3 (roughly): "the subjects are the 3,900"

    • Issue: This sounds like mislabeling the population when the number (3,900) is more likely a sample size. The population is not identified; the count 3,900 likely refers to the sample, not the entire population.

    • Improvement: Distinguish population vs sample size: If the study has 3,900 participants, report: "The sample consists of 3,900 participants; the population is [defined group]." Then proceed with the statistic and its inference to the population.

  • Takeaway about the errors:

    • Do not define the population by a sample size; specify the population clearly.

    • Do not generalize a sample proportion to the population without appropriate sampling methodology and uncertainty quantification.

    • Use clear, consistent phrasing when describing comparisons (which group is being described, and whether you refer to a population or a sample).

Corrected and clarified phrasing: examples you can use

  • Example A: Population vs sample

    • Population: all adults in country X.

    • Sample: a randomly selected group of 1,000 adults from country X.

    • Reported statistic: \hat{p} = 0.62 (58–66% in 95% CI) for frequently engaging in activity Y.

    • Proper interpretation: "In this sample of 1,000 adults, 62% reported engaging in activity Y. This provides an estimate of the population proportion p with a confidence interval (e.g., 95% CI)."

  • Example B: Comparing to another population

    • If you want to compare to the UK population proportion, ensure you have data for that population and report both estimates with CIs, or state clearly if the comparison is qualitative.

    • Correct phrasing: "The sample proportion in our study is \hat{p}=0.60\pm SE. The UK population estimate is p_{UK}=0.65 (with its own CI). Direct comparison should consider sampling error and context."

  • Example C: Misinterpreted statement corrected

    • Incorrect: "9 out of 10 people in the UK recycle." (unverified universal claim)

    • Correct: "In the sample, 9 out of 10 respondents (\hat{p}=0.90)) frequently recycle."; If you want to generalize to the population, you need proper sampling and a reported confidence interval.

Practical guidelines for reading and writing these statements

  • Always identify:

    • Population: who is the target group?

    • Sample: who was actually observed?

    • Statistic: what numeric value is reported from the sample?

    • Parameter: what population value is the target of estimation?

  • Use consistent language: write clearly whether you are describing a sample statistic or a population parameter.

  • Report uncertainty: include standard error or confidence intervals when inferring from a sample to a population.

  • Be careful with comparisons: when using phrases like "compared to X," specify the comparator's population and ensure the comparison is valid given sampling design.

Quick practice prompts

  • Given a study with a sample size n=500 and 220 report frequent recycling, compute the sample proportion: \hat{p}= \frac{220}{500} = 0.44.

  • Compute the 95% confidence interval for the population proportion using the normal approximation:

    • CI = \hat{p} \pm z_{0.025} \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} with \hat{p}=0.44 and n=500$$.

  • Interpret the result in terms of population inference, noting any assumptions and potential limitations.

Summary: key points to remember

  • Population vs Sample definitions are essential for correct interpretation.

  • Parameters describe population; statistics describe samples.

  • A proportion like "9 out of 10" is a fraction that must be tied to a defined sample or population and must be reported with context and uncertainty.

  • The exact population size (e.g., 3,900) does not automatically define the population; clarify whether that number is a sample size or population size.

  • Always phrase results with clear reference to whether you are discussing a sample statistic or a population parameter, and report uncertainty where appropriate.