Statistics Notes: Population, Data, and Levels of Measurement
Population, Sample, Parameter, and Statistic
- Statistics is built on data. Data = information from observations, counts, measurements, or responses from surveys or required information.
- Data should come from real sources and be screened by multiple minds.
- Examples mentioned:
- Survey claim: "more than seven out of 10 Americans say nursing is a prestigious publication".
- Social media finding (dated 2019): average age of prosocial content consumption by kids.
- The four-part process of statistics:
- Collect data
- Organize data
- Analyze data
- Interpret data and make informed decisions
- Formally, the process is often summarized as:
{\text{Collect, Organize, Analyze, Interpret}}
Population vs. Sample
- Population
- The collection of all possible outcomes/responses/measurements/counts of interest in a study.
- "All" possible outcomes in the group you care about.
- Sample
- A smaller part or subset of the population.
- Used because the full population is often too large to study.
- Illustrative example: a pictured group of 30 people represents a population; selecting 5–6 of them represents a sample.
- Exercise example: In a recent survey, 834 US employees were asked if their jobs were highly stressful. Of the 834 respondents, 517 said yes.
- Population: all employees in the US.
- Sample: the 834 employees surveyed.
- The dataset for the sample: 517 Yes, 317 No (since 834 − 517 = 317).
- The green box in the illustration represents all possible responses; the survey responses are a subset of all responses.
- Non-respondents are not observed; only the respondents form the dataset.
- Key rule: the sample is always a subset of the population’s responses or outcomes.
- Quick terminology recap:
- Population usually denoted by the overall group you want to understand.
- Sample is the observed subset drawn from that population.
Population Parameters vs. Sample Statistics
- Parameter (population parameter)
- A numerical description of a characteristic of the population.
- Examples:
- The population mean \mu (e.g., the average age of people in the US).
- Statistic (sample statistic)
- A numerical description of a characteristic of the sample.
- Examples:
- The sample mean of a subset (e.g., average age in a sample of three states) \bar{x}.
- Quick distinctions:
- Parameter describes the population.
- Statistic describes the sample.
- Example exercise: Determine whether statements describe a population parameter or a sample statistic:
- "Surveys of student-athletes in the US found that their average time spent on athletics is about 50 hours per week."
- This is a statistic if based on a sample (e.g., several hundred collegiate athletes). If it is stated as the entire population (all US student-athletes), it would be a parameter.
- "The freshman class at a university has an average SAT math score of 514."
- If this refers to the entire freshman class, it is a population parameter. If it is based on a subset, it is a statistic.
- "A random sample of several hundred retail stores found that 34% were not storing fish at the proper temperature."
- This is a sample statistic (34%) based on the sampled stores, not all stores.
- Takeaway: correctly identify population vs. sample, and parameter vs. statistic, by checking whether the value describes the whole group or just a subset.
Descriptive vs. Inferential Statistics
- Descriptive statistics
- Purpose: organize, summarize, and display data.
- Process: collect data, summarize with tables/graphs, present to an audience.
- Focus: describing what the data show for the observed sample.
- Inferential statistics
- Purpose: use sample data to draw conclusions about a population.
- Process: make inferences about the population based on the sample results; assess uncertainty and generalizability.
- The flow:
- Descriptive: population → sample → numerical descriptors → conclusions about the sample
- Inferential: sample → general conclusions about the population
- Time allocation (instructional estimate):
- Descriptive statistics: about a quarter to 40% of course time.
- Inferential statistics: the remaining ~60%.
- Instructional goal: given study statements, identify (1) population, (2) sample, (3) descriptive component, (4) potential inferential conclusion.
Worked Examples: Identifying Population, Sample, and Descriptive vs. Inferential
- Example 1: Study of 2,560 US adults found that 23% were from households earning less than $30,000 annually and not using the Internet.
- Population: all US adults.
- Sample: the 2,560 adults surveyed.
- Descriptive statistic: 23% (from the sample) describes the sample’s characteristic.
- Inferential conclusion (potential): higher likelihood of not using the Internet is associated with lower income; broader inference would discuss internet access and affordability, given the population context.
- Example 2: Study of 300 Wall Street analysts found that 44% incorrectly forecast high-tech earnings in the recent year.
- Population: all Wall Street analysts.
- Sample: the 300 analysts surveyed.
- Descriptive statistic: 44% of the sample incorrectly forecast earnings.
- Inferential conclusion (potential): even professionals have forecasting errors; forecasting the stock market is difficult, suggesting caution about relying on analyst forecasts.
- Takeaway: practice separating population, sample, descriptive results, and possible inferential inferences.
Data Collection: Qualitative vs. Quantitative Data
- Qualitative (categorical) data
- Attributes, labels, or non-numeric descriptions.
- Examples: hair color, eye color, major, birth country.
- Quantitative (numerical) data
- Numerical values (measurements, counts).
- Examples: age, height, weight, temperature, counts like number of visits.
- Example table (sports injuries in US ERs):
- Qualitative data: types of sports (basketball, baseball, football, etc.).
- Quantitative data: counts of injuries (numbers per sport).
- Data types summary:
- Qualitative vs. Quantitative
- Qualitative can be nominal or ordinal; quantitative can be discrete or continuous.
Levels of Measurement
- Nominal level
- Data are names or labels with no inherent order.
- No mathematical computations are meaningful.
- Examples: types of sports, genres of movies (labels).
- Ordinal level
- Data can be arranged in order (ranked), but differences between ranks may be meaningless.
- Can include qualitative or quantitative data.
- Examples: ranking of occupations by growth, movie genre popularity labels (in practice, some ordinal use involves rankings where numeric differences matter only in order).
- Interval level
- Data are numerical and can be ordered; differences are meaningful.
- Zero is a position on the scale, not an inherent zero.
- Example: average monthly rainfall in a city (mm or inches) where a zero value simply means no rainfall, but zero is not an absolute absence of rainfall in a physical sense; more importantly, differences between values are meaningful.
- Ratio level
- Data are numerical with an inherent zero that means 'none.' Ratios are meaningful.
- Examples: counts of items, temperatures on a Kelvin scale, home run totals, salaries.
- Key property: you can form meaningful ratios: e.g., 20 vs 40 has a ratio of 2:1; zeros indicate none.
- Quick diagnostic rules from the lecture:
- Nominal: categories with no order; no arithmetic.
- Ordinal: categories with order; differences not necessarily meaningful.
- Interval: numerical; differences meaningful; zero is a position.
- Ratio: numerical; differences and ratios meaningful; zero is inherent.
Practical Examples: Nominal, Ordinal, Interval, and Ratio
- Dataset 1: US occupations with the most job growth (ranked order) vs. movie genres (labels)
- Occupations (ranked): Ordinal (ordered ranks).
- Movie genres: Nominal (labels without intrinsic order).
- Dataset 2: New York Yankees World Series victories vs. 2016 AL home run totals by team
- Yankees World Series victories: Interval (numbers are counts with ordering; zero year does not have a meaningful zero; differences exist but not ratios with a meaningful zero refinement).
- 2016 AL home run totals by team: Ratio (zero is possible, ratios meaningful; you can say one team hit twice as many as another).
- Key reasoning: for interval, you can compute differences (e.g., year-to-year changes) but not meaningful ratios if zero is not inherent; for ratio, you can compute both differences and meaningful ratios with an absolute zero.
- Quick recap of the data-type relationship:
- Qualitative data align with nominal or ordinal levels.
- Quantitative data align with interval or ratio levels.
- Discrete vs. Continuous: often discussed in later sections; discrete data are countable (e.g., number of students), continuous data are measurable (e.g., height).
- Recap diagram-style takeaway:
- Nominal vs. ordinal -> qualitative data
- Interval vs. ratio -> quantitative data
- Nominal/Ordinal can be treated as discrete or ordered categories; Interval/Ratio are numeric with more mathematical operations available.
- Descriptive statistics path: \text{Population} \rightarrow \text{Sample} \rightarrow \text{Descriptive descriptors} \rightarrow \text{Display}
- Inferential statistics path: \text{Sample} \rightarrow \text{Population conclusions}
- Population parameter example: population mean \mu
- Sample statistic example: sample mean \bar{x}
- Example of a data use case:
- Guardrails for reporting: include source when quoting statistics.
- Use of real data from sources to inform decisions in business, environment, or public health settings.
Takeaways for Exam Preparation
- Be able to identify:
- Population and Sample from a study description.
- Whether a reported value is a Parameter (population) or a Statistic (sample).
- Whether a reported value is Descriptive (part of descriptive statistics) or Inferential (leading to population-level conclusions).
- Distinguish data types:
- Qualitative vs. Quantitative
- Within Qualitative: nominal vs. ordinal
- Within Quantitative: interval vs. ratio (and the concepts of discrete vs. continuous, zero origin, and meaningful ratios).
- Practice classifying example statements and data sets into the four levels of measurement and determining the appropriate type of analysis.
End-of-Section Summary
- Statistics is a science of collecting, organizing, analyzing, and interpreting data to make informed decisions.
- Population vs. Sample; Parameter vs. Statistic.
- Descriptive vs. Inferential statistics, with a general rule of direction (population -> sample descriptive; sample -> population inferences).
- Data types and measurement levels determine what kinds of summaries and comparisons are meaningful.
- Real-world examples help solidify whether a value is descriptive versus inferential and which measurement level applies to the data.