Statistics Notes: Population, Data, and Levels of Measurement

Population, Sample, Parameter, and Statistic

  • Statistics is built on data. Data = information from observations, counts, measurements, or responses from surveys or required information.
    • Data should come from real sources and be screened by multiple minds.
    • Examples mentioned:
    • Survey claim: "more than seven out of 10 Americans say nursing is a prestigious publication".
    • Social media finding (dated 2019): average age of prosocial content consumption by kids.
  • The four-part process of statistics:
    • Collect data
    • Organize data
    • Analyze data
    • Interpret data and make informed decisions
    • Formally, the process is often summarized as:
      {\text{Collect, Organize, Analyze, Interpret}}

Population vs. Sample

  • Population
    • The collection of all possible outcomes/responses/measurements/counts of interest in a study.
    • "All" possible outcomes in the group you care about.
  • Sample
    • A smaller part or subset of the population.
    • Used because the full population is often too large to study.
  • Illustrative example: a pictured group of 30 people represents a population; selecting 5–6 of them represents a sample.
  • Exercise example: In a recent survey, 834 US employees were asked if their jobs were highly stressful. Of the 834 respondents, 517 said yes.
    • Population: all employees in the US.
    • Sample: the 834 employees surveyed.
    • The dataset for the sample: 517 Yes, 317 No (since 834 − 517 = 317).
    • The green box in the illustration represents all possible responses; the survey responses are a subset of all responses.
    • Non-respondents are not observed; only the respondents form the dataset.
  • Key rule: the sample is always a subset of the population’s responses or outcomes.
  • Quick terminology recap:
    • Population usually denoted by the overall group you want to understand.
    • Sample is the observed subset drawn from that population.

Population Parameters vs. Sample Statistics

  • Parameter (population parameter)
    • A numerical description of a characteristic of the population.
    • Examples:
    • The population mean \mu (e.g., the average age of people in the US).
  • Statistic (sample statistic)
    • A numerical description of a characteristic of the sample.
    • Examples:
    • The sample mean of a subset (e.g., average age in a sample of three states) \bar{x}.
  • Quick distinctions:
    • Parameter describes the population.
    • Statistic describes the sample.
  • Example exercise: Determine whether statements describe a population parameter or a sample statistic:
    • "Surveys of student-athletes in the US found that their average time spent on athletics is about 50 hours per week."
    • This is a statistic if based on a sample (e.g., several hundred collegiate athletes). If it is stated as the entire population (all US student-athletes), it would be a parameter.
    • "The freshman class at a university has an average SAT math score of 514."
    • If this refers to the entire freshman class, it is a population parameter. If it is based on a subset, it is a statistic.
    • "A random sample of several hundred retail stores found that 34% were not storing fish at the proper temperature."
    • This is a sample statistic (34%) based on the sampled stores, not all stores.
  • Takeaway: correctly identify population vs. sample, and parameter vs. statistic, by checking whether the value describes the whole group or just a subset.

Descriptive vs. Inferential Statistics

  • Descriptive statistics
    • Purpose: organize, summarize, and display data.
    • Process: collect data, summarize with tables/graphs, present to an audience.
    • Focus: describing what the data show for the observed sample.
  • Inferential statistics
    • Purpose: use sample data to draw conclusions about a population.
    • Process: make inferences about the population based on the sample results; assess uncertainty and generalizability.
    • The flow:
    • Descriptive: population → sample → numerical descriptors → conclusions about the sample
    • Inferential: sample → general conclusions about the population
  • Time allocation (instructional estimate):
    • Descriptive statistics: about a quarter to 40% of course time.
    • Inferential statistics: the remaining ~60%.
  • Instructional goal: given study statements, identify (1) population, (2) sample, (3) descriptive component, (4) potential inferential conclusion.

Worked Examples: Identifying Population, Sample, and Descriptive vs. Inferential

  • Example 1: Study of 2,560 US adults found that 23% were from households earning less than $30,000 annually and not using the Internet.
    • Population: all US adults.
    • Sample: the 2,560 adults surveyed.
    • Descriptive statistic: 23% (from the sample) describes the sample’s characteristic.
    • Inferential conclusion (potential): higher likelihood of not using the Internet is associated with lower income; broader inference would discuss internet access and affordability, given the population context.
  • Example 2: Study of 300 Wall Street analysts found that 44% incorrectly forecast high-tech earnings in the recent year.
    • Population: all Wall Street analysts.
    • Sample: the 300 analysts surveyed.
    • Descriptive statistic: 44% of the sample incorrectly forecast earnings.
    • Inferential conclusion (potential): even professionals have forecasting errors; forecasting the stock market is difficult, suggesting caution about relying on analyst forecasts.
  • Takeaway: practice separating population, sample, descriptive results, and possible inferential inferences.

Data Collection: Qualitative vs. Quantitative Data

  • Qualitative (categorical) data
    • Attributes, labels, or non-numeric descriptions.
    • Examples: hair color, eye color, major, birth country.
  • Quantitative (numerical) data
    • Numerical values (measurements, counts).
    • Examples: age, height, weight, temperature, counts like number of visits.
  • Example table (sports injuries in US ERs):
    • Qualitative data: types of sports (basketball, baseball, football, etc.).
    • Quantitative data: counts of injuries (numbers per sport).
  • Data types summary:
    • Qualitative vs. Quantitative
    • Qualitative can be nominal or ordinal; quantitative can be discrete or continuous.

Levels of Measurement

  • Nominal level
    • Data are names or labels with no inherent order.
    • No mathematical computations are meaningful.
    • Examples: types of sports, genres of movies (labels).
  • Ordinal level
    • Data can be arranged in order (ranked), but differences between ranks may be meaningless.
    • Can include qualitative or quantitative data.
    • Examples: ranking of occupations by growth, movie genre popularity labels (in practice, some ordinal use involves rankings where numeric differences matter only in order).
  • Interval level
    • Data are numerical and can be ordered; differences are meaningful.
    • Zero is a position on the scale, not an inherent zero.
    • Example: average monthly rainfall in a city (mm or inches) where a zero value simply means no rainfall, but zero is not an absolute absence of rainfall in a physical sense; more importantly, differences between values are meaningful.
  • Ratio level
    • Data are numerical with an inherent zero that means 'none.' Ratios are meaningful.
    • Examples: counts of items, temperatures on a Kelvin scale, home run totals, salaries.
    • Key property: you can form meaningful ratios: e.g., 20 vs 40 has a ratio of 2:1; zeros indicate none.
  • Quick diagnostic rules from the lecture:
    • Nominal: categories with no order; no arithmetic.
    • Ordinal: categories with order; differences not necessarily meaningful.
    • Interval: numerical; differences meaningful; zero is a position.
    • Ratio: numerical; differences and ratios meaningful; zero is inherent.

Practical Examples: Nominal, Ordinal, Interval, and Ratio

  • Dataset 1: US occupations with the most job growth (ranked order) vs. movie genres (labels)
    • Occupations (ranked): Ordinal (ordered ranks).
    • Movie genres: Nominal (labels without intrinsic order).
  • Dataset 2: New York Yankees World Series victories vs. 2016 AL home run totals by team
    • Yankees World Series victories: Interval (numbers are counts with ordering; zero year does not have a meaningful zero; differences exist but not ratios with a meaningful zero refinement).
    • 2016 AL home run totals by team: Ratio (zero is possible, ratios meaningful; you can say one team hit twice as many as another).
  • Key reasoning: for interval, you can compute differences (e.g., year-to-year changes) but not meaningful ratios if zero is not inherent; for ratio, you can compute both differences and meaningful ratios with an absolute zero.
  • Quick recap of the data-type relationship:
    • Qualitative data align with nominal or ordinal levels.
    • Quantitative data align with interval or ratio levels.
    • Discrete vs. Continuous: often discussed in later sections; discrete data are countable (e.g., number of students), continuous data are measurable (e.g., height).
  • Recap diagram-style takeaway:
    • Nominal vs. ordinal -> qualitative data
    • Interval vs. ratio -> quantitative data
    • Nominal/Ordinal can be treated as discrete or ordered categories; Interval/Ratio are numeric with more mathematical operations available.

Quick References and Formulas

  • Descriptive statistics path: \text{Population} \rightarrow \text{Sample} \rightarrow \text{Descriptive descriptors} \rightarrow \text{Display}
  • Inferential statistics path: \text{Sample} \rightarrow \text{Population conclusions}
  • Population parameter example: population mean \mu
  • Sample statistic example: sample mean \bar{x}
  • Example of a data use case:
    • Guardrails for reporting: include source when quoting statistics.
    • Use of real data from sources to inform decisions in business, environment, or public health settings.

Takeaways for Exam Preparation

  • Be able to identify:
    • Population and Sample from a study description.
    • Whether a reported value is a Parameter (population) or a Statistic (sample).
    • Whether a reported value is Descriptive (part of descriptive statistics) or Inferential (leading to population-level conclusions).
  • Distinguish data types:
    • Qualitative vs. Quantitative
    • Within Qualitative: nominal vs. ordinal
    • Within Quantitative: interval vs. ratio (and the concepts of discrete vs. continuous, zero origin, and meaningful ratios).
  • Practice classifying example statements and data sets into the four levels of measurement and determining the appropriate type of analysis.

End-of-Section Summary

  • Statistics is a science of collecting, organizing, analyzing, and interpreting data to make informed decisions.
  • Population vs. Sample; Parameter vs. Statistic.
  • Descriptive vs. Inferential statistics, with a general rule of direction (population -> sample descriptive; sample -> population inferences).
  • Data types and measurement levels determine what kinds of summaries and comparisons are meaningful.
  • Real-world examples help solidify whether a value is descriptive versus inferential and which measurement level applies to the data.