Statistics Notes: Population, Data, and Levels of Measurement

Population, Sample, Parameter, and Statistic

Statistics is built on data. Data = information from observations, counts, measurements, or responses from surveys or required information.
- Data should come from real sources and be screened by multiple minds.
- Examples mentioned:
- Survey claim: "more than seven out of 10 Americans say nursing is a prestigious publication".
- Social media finding (dated 2019): average age of prosocial content consumption by kids.
The four-part process of statistics:
- Collect data
- Organize data
- Analyze data
- Interpret data and make informed decisions
- Formally, the process is often summarized as:
  {\text{Collect, Organize, Analyze, Interpret}}

Population vs. Sample

Population
- The collection of all possible outcomes/responses/measurements/counts of interest in a study.
- "All" possible outcomes in the group you care about.
Sample
- A smaller part or subset of the population.
- Used because the full population is often too large to study.
Illustrative example: a pictured group of 30 people represents a population; selecting 5–6 of them represents a sample.
Exercise example: In a recent survey, 834 US employees were asked if their jobs were highly stressful. Of the 834 respondents, 517 said yes.
- Population: all employees in the US.
- Sample: the 834 employees surveyed.
- The dataset for the sample: 517 Yes, 317 No (since 834 − 517 = 317).
- The green box in the illustration represents all possible responses; the survey responses are a subset of all responses.
- Non-respondents are not observed; only the respondents form the dataset.
Key rule: the sample is always a subset of the population’s responses or outcomes.
Quick terminology recap:
- Population usually denoted by the overall group you want to understand.
- Sample is the observed subset drawn from that population.

Population Parameters vs. Sample Statistics

Parameter (population parameter)
- A numerical description of a characteristic of the population.
- Examples:
- The population mean \mu (e.g., the average age of people in the US).
Statistic (sample statistic)
- A numerical description of a characteristic of the sample.
- Examples:
- The sample mean of a subset (e.g., average age in a sample of three states) \bar{x}.
Quick distinctions:
- Parameter describes the population.
- Statistic describes the sample.
Example exercise: Determine whether statements describe a population parameter or a sample statistic:
- "Surveys of student-athletes in the US found that their average time spent on athletics is about 50 hours per week."
- This is a statistic if based on a sample (e.g., several hundred collegiate athletes). If it is stated as the entire population (all US student-athletes), it would be a parameter.
- "The freshman class at a university has an average SAT math score of 514."
- If this refers to the entire freshman class, it is a population parameter. If it is based on a subset, it is a statistic.
- "A random sample of several hundred retail stores found that 34% were not storing fish at the proper temperature."
- This is a sample statistic (34%) based on the sampled stores, not all stores.
Takeaway: correctly identify population vs. sample, and parameter vs. statistic, by checking whether the value describes the whole group or just a subset.

Descriptive vs. Inferential Statistics

Descriptive statistics
- Purpose: organize, summarize, and display data.
- Process: collect data, summarize with tables/graphs, present to an audience.
- Focus: describing what the data show for the observed sample.
Inferential statistics
- Purpose: use sample data to draw conclusions about a population.
- Process: make inferences about the population based on the sample results; assess uncertainty and generalizability.
- The flow:
- Descriptive: population → sample → numerical descriptors → conclusions about the sample
- Inferential: sample → general conclusions about the population
Time allocation (instructional estimate):
- Descriptive statistics: about a quarter to 40% of course time.
- Inferential statistics: the remaining ~60%.
Instructional goal: given study statements, identify (1) population, (2) sample, (3) descriptive component, (4) potential inferential conclusion.

Worked Examples: Identifying Population, Sample, and Descriptive vs. Inferential

Example 1: Study of 2,560 US adults found that 23% were from households earning less than $30,000 annually and not using the Internet.
- Population: all US adults.
- Sample: the 2,560 adults surveyed.
- Descriptive statistic: 23% (from the sample) describes the sample’s characteristic.
- Inferential conclusion (potential): higher likelihood of not using the Internet is associated with lower income; broader inference would discuss internet access and affordability, given the population context.
Example 2: Study of 300 Wall Street analysts found that 44% incorrectly forecast high-tech earnings in the recent year.
- Population: all Wall Street analysts.
- Sample: the 300 analysts surveyed.
- Descriptive statistic: 44% of the sample incorrectly forecast earnings.
- Inferential conclusion (potential): even professionals have forecasting errors; forecasting the stock market is difficult, suggesting caution about relying on analyst forecasts.
Takeaway: practice separating population, sample, descriptive results, and possible inferential inferences.

Data Collection: Qualitative vs. Quantitative Data

Qualitative (categorical) data
- Attributes, labels, or non-numeric descriptions.
- Examples: hair color, eye color, major, birth country.
Quantitative (numerical) data
- Numerical values (measurements, counts).
- Examples: age, height, weight, temperature, counts like number of visits.
Example table (sports injuries in US ERs):
- Qualitative data: types of sports (basketball, baseball, football, etc.).
- Quantitative data: counts of injuries (numbers per sport).
Data types summary:
- Qualitative vs. Quantitative
- Qualitative can be nominal or ordinal; quantitative can be discrete or continuous.

Levels of Measurement

Nominal level
- Data are names or labels with no inherent order.
- No mathematical computations are meaningful.
- Examples: types of sports, genres of movies (labels).
Ordinal level
- Data can be arranged in order (ranked), but differences between ranks may be meaningless.
- Can include qualitative or quantitative data.
- Examples: ranking of occupations by growth, movie genre popularity labels (in practice, some ordinal use involves rankings where numeric differences matter only in order).
Interval level
- Data are numerical and can be ordered; differences are meaningful.
- Zero is a position on the scale, not an inherent zero.
- Example: average monthly rainfall in a city (mm or inches) where a zero value simply means no rainfall, but zero is not an absolute absence of rainfall in a physical sense; more importantly, differences between values are meaningful.
Ratio level
- Data are numerical with an inherent zero that means 'none.' Ratios are meaningful.
- Examples: counts of items, temperatures on a Kelvin scale, home run totals, salaries.
- Key property: you can form meaningful ratios: e.g., 20 vs 40 has a ratio of 2:1; zeros indicate none.
Quick diagnostic rules from the lecture:
- Nominal: categories with no order; no arithmetic.
- Ordinal: categories with order; differences not necessarily meaningful.
- Interval: numerical; differences meaningful; zero is a position.
- Ratio: numerical; differences and ratios meaningful; zero is inherent.

Practical Examples: Nominal, Ordinal, Interval, and Ratio

Dataset 1: US occupations with the most job growth (ranked order) vs. movie genres (labels)
- Occupations (ranked): Ordinal (ordered ranks).
- Movie genres: Nominal (labels without intrinsic order).
Dataset 2: New York Yankees World Series victories vs. 2016 AL home run totals by team
- Yankees World Series victories: Interval (numbers are counts with ordering; zero year does not have a meaningful zero; differences exist but not ratios with a meaningful zero refinement).
- 2016 AL home run totals by team: Ratio (zero is possible, ratios meaningful; you can say one team hit twice as many as another).
Key reasoning: for interval, you can compute differences (e.g., year-to-year changes) but not meaningful ratios if zero is not inherent; for ratio, you can compute both differences and meaningful ratios with an absolute zero.
Quick recap of the data-type relationship:
- Qualitative data align with nominal or ordinal levels.
- Quantitative data align with interval or ratio levels.
- Discrete vs. Continuous: often discussed in later sections; discrete data are countable (e.g., number of students), continuous data are measurable (e.g., height).
Recap diagram-style takeaway:
- Nominal vs. ordinal -> qualitative data
- Interval vs. ratio -> quantitative data
- Nominal/Ordinal can be treated as discrete or ordered categories; Interval/Ratio are numeric with more mathematical operations available.

Quick References and Formulas

Descriptive statistics path: \text{Population} \rightarrow \text{Sample} \rightarrow \text{Descriptive descriptors} \rightarrow \text{Display}
Inferential statistics path: \text{Sample} \rightarrow \text{Population conclusions}
Population parameter example: population mean \mu
Sample statistic example: sample mean \bar{x}
Example of a data use case:
- Guardrails for reporting: include source when quoting statistics.
- Use of real data from sources to inform decisions in business, environment, or public health settings.

Takeaways for Exam Preparation

Be able to identify:
- Population and Sample from a study description.
- Whether a reported value is a Parameter (population) or a Statistic (sample).
- Whether a reported value is Descriptive (part of descriptive statistics) or Inferential (leading to population-level conclusions).
Distinguish data types:
- Qualitative vs. Quantitative
- Within Qualitative: nominal vs. ordinal
- Within Quantitative: interval vs. ratio (and the concepts of discrete vs. continuous, zero origin, and meaningful ratios).
Practice classifying example statements and data sets into the four levels of measurement and determining the appropriate type of analysis.

End-of-Section Summary

Statistics is a science of collecting, organizing, analyzing, and interpreting data to make informed decisions.
Population vs. Sample; Parameter vs. Statistic.
Descriptive vs. Inferential statistics, with a general rule of direction (population -> sample descriptive; sample -> population inferences).
Data types and measurement levels determine what kinds of summaries and comparisons are meaningful.
Real-world examples help solidify whether a value is descriptive versus inferential and which measurement level applies to the data.