Notes: Introduction to the Practice of Statistics (1.1)
Objective 1: Define statistics and statistical thinking
- Data are observations collected from individuals or objects (measurements, genders, survey responses). Data vary across individuals; variability is common (e.g., heights, hair color, sleep hours, daily calories).
- Statistics is a collection of methods for planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions from the data. It also involves providing a measure of confidence in conclusions.
- Statistical thinking involves understanding variability, planning studies to reduce bias, and using data to support conclusions with quantifiable uncertainty.
Objective 2: Explain the process of statistics
- Population: the complete collection of all elements to be studied (scores, people, measurements, etc.). The population is the entire group of interest.
- Individual: a member of the population being studied.
- Sample: a subcollection of individuals selected from the population.
- Census: a list of all individuals in a population along with characteristics of each individual.
- Descriptive statistics: organizing and summarizing data using numerical summaries, tables, and graphs.
- Inferential statistics: methods that take results from a sample and generalize them to the population, including measures of reliability (uncertainty).
- Parameter vs. statistic:
- A parameter is a numerical summary of a population. Often denoted by ? (theta) in notation.
- A statistic is a numerical summary based on a sample. Often denoted by hat? (hat theta).
- Example relationships:
- In NYC, 3250 walk buttons exist at intersections; 77% do not work. {
Parameter vs. Statistic: the 77% could be a sample statistic or population parameter depending on whether it refers to the whole city (parameter) or a sample (statistic).
} - Example 1 (illustrative):
- Based on a population example: 3250 buttons, 77% non-working.
- Based on a sample: from 877 surveyed executives, 45% would not hire someone with a typographic error.
- Suppose the campus job percentage is 84.9% (population parameter). If a sample of 250 students yields 86.4% with jobs, that sample result is a statistic.
- Practical note: Distinguish when a figure refers to the full population (parameter) versus a sample (statistic).
Objective 3: Distinguish between qualitative and quantitative variables
- Variables are characteristics of individuals; they vary across individuals.
- Quantitative variables provide numerical measures; their values can be added or subtracted to yield meaningful results (e.g., temperature, volume).
- Qualitative (categorical, or attribute) variables classify individuals based on attributes or characteristics (e.g., gender, ZIP code).
- Example from study (Elisabeth Kvaavik and colleagues): classify the following variables as qualitative or quantitative:
- Nationality — qualitative
- Number of children — qualitative (as listed in the transcript; note: typically this is quantitative discrete)
- Household income in the previous year — quantitative
- Level of education — quantitative (per transcript; typically qualitative/ordinal in practice)
- Daily intake of whole grains (grams/day) — quantitative
- Distinguishing between qualitative and quantitative variables (Example 2): the transcript lists a sequence: qualitative, qualitative, quantitative, quantitative, quantitative.
Objective 4: Distinguish between discrete and continuous variables
- Discrete variable: a quantitative variable with a finite or countable number of possible values (e.g., counts like 0, 1, 2, 3, …).
- Continuous variable: a quantitative variable with an infinite number of possible values that can be measured to any desired level of accuracy (e.g., height, weight, time).
- Application to the Elisabeth Kvaavik study (Example 4): classify the following quantitative variables as discrete or continuous:
- Number of children — discrete
- Household income in the previous year — continuous
- Daily intake of whole grains (grams/day) — continuous
- Data vs observations:
- The list of observations a variable assumes is called data.
- A variable such as gender is a variable; the observed values (male, female) are data.
- Data types:
- Qualitative data correspond to qualitative variables.
- Quantitative data correspond to quantitative variables.
- Discrete data correspond to discrete variables.
- Continuous data correspond to continuous variables.
Objective 5: Determine the level of measurement of a variable
- Nominal level: values name or categorize; no inherent order. Examples: yes/no/undecided; colors; jersey numbers; city names; last names. No meaningful ranking.
- Four Levels of Measurement: N (Nominal)
- Ordinal level: has nominal properties plus an inherent order or ranking among values. Examples: course grades; small/medium/large; ranks.
- Four Levels of Measurement: O (Ordinal)
- Interval level: retains ordinal properties, and differences between values are meaningful; zero does not indicate absence of quantity. Arithmetic operations such as addition and subtraction are meaningful.
- Four Levels of Measurement: I (Interval)
- Examples: temperatures, years.
- Ratio level: has interval properties and meaningful ratios; zero indicates absence of quantity. Arithmetic operations such as multiplication and division are meaningful.
- Four Levels of Measurement: R (Ratio)
- Examples: height, weight, prices.
- Example 6 (School eating patterns study): Determine the level of measurement for the following variables from a study of vending machines and school policies in 20 U.S. high schools (1088 students):
- Number of snack and soft drink vending machines in the school — ratio
- Whether or not the school has a closed campus policy during lunch — nominal
- Class rank (Freshman, Sophomore, Junior, Senior) — ordinal
- Number of days per week a student eats school lunch — ratio
- Homework reference: Try these examples (Pg. 11/8-44 evens, 48) for practice.
Connections and implications
- Foundational principles: variability in data, population vs sample, and the distinction between descriptive and inferential statistics underpin all data analysis workflows.
- Real-world relevance: understanding level of measurement and data types guides appropriate statistical methods, visualizations, and interpretations.
- Practical considerations: when designing studies, researchers must choose sampling methods, plan data collection to minimize bias, and select summaries that reflect the nature of the data (nominal, ordinal, interval, ratio).
- Ethical implications: data collection and interpretation should consider reliability, validity, and the potential for misleading conclusions if inappropriate methods or misclassifications are used.
Quick reference definitions
- Population: complete set of all elements under study.
- Sample: subset of the population.
- Census: data collection from every member of the population.
- Parameter: numerical summary of a population.
- Statistic: numerical summary of a sample.
- Descriptive statistics: summarize data (numerical summaries, tables, graphs).
- Inferential statistics: generalize from sample to population and assess reliability.
- Qualitative vs quantitative data; discrete vs continuous data.
- Levels of measurement: nominal, ordinal, interval, ratio.
Notation reminders (typical in practice)
- Population parameter: \theta\u007f (theta)
- Sample statistic: \hat{\theta}\u007f (hat theta)
- When a sample statistic estimates a population parameter, the statistic is an estimator of the parameter.
Important numerical references from the transcript
- NYC walk buttons: 3250 total; 77% do not work.
- Sample of executives: n = 877; 45% would not hire someone with a typographic error.
- Campus job example: population percentage 84.9%; sample with 250 students yields 86.4% with a job.
- Example 6 labeling: ratio, ordinal, ratio, nominal.
- Example 4 labeling: discrete, continuous, continuous.
- Levels of measurement labels: N, O, I, R.
Summary tips for exam preparation
- Be able to define and distinguish: population vs sample, census, parameter vs statistic, descriptive vs inferential statistics.
- Be able to classify variables as qualitative vs quantitative, and further as discrete vs continuous when quantitative.
- Be able to determine the level of measurement for a given variable (nominal, ordinal, interval, ratio) and justify the classification.
- Remember the example contexts (city data, employment data, school data) as templates for thinking about real-world datasets.
- Practice identifying how the level of measurement affects permissible operations (e.g., addition/subtraction for interval/ratio; no meaningful order for nominal).