Notes: Introduction to the Practice of Statistics (1.1)

Objective 1: Define statistics and statistical thinking

Data are observations collected from individuals or objects (measurements, genders, survey responses). Data vary across individuals; variability is common (e.g., heights, hair color, sleep hours, daily calories).
Statistics is a collection of methods for planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions from the data. It also involves providing a measure of confidence in conclusions.
Statistical thinking involves understanding variability, planning studies to reduce bias, and using data to support conclusions with quantifiable uncertainty.

Objective 2: Explain the process of statistics

Population: the complete collection of all elements to be studied (scores, people, measurements, etc.). The population is the entire group of interest.
Individual: a member of the population being studied.
Sample: a subcollection of individuals selected from the population.
Census: a list of all individuals in a population along with characteristics of each individual.
Descriptive statistics: organizing and summarizing data using numerical summaries, tables, and graphs.
Inferential statistics: methods that take results from a sample and generalize them to the population, including measures of reliability (uncertainty).
Parameter vs. statistic:
- A parameter is a numerical summary of a population. Often denoted by ? (theta) in notation.
- A statistic is a numerical summary based on a sample. Often denoted by hat? (hat theta).
- Example relationships:
- In NYC, 3250 walk buttons exist at intersections; 77% do not work. {
  Parameter vs. Statistic: the 77% could be a sample statistic or population parameter depending on whether it refers to the whole city (parameter) or a sample (statistic).
  }
- Example 1 (illustrative):
- Based on a population example: 3250 buttons, 77% non-working.
- Based on a sample: from 877 surveyed executives, 45% would not hire someone with a typographic error.
- Suppose the campus job percentage is 84.9% (population parameter). If a sample of 250 students yields 86.4% with jobs, that sample result is a statistic.
Practical note: Distinguish when a figure refers to the full population (parameter) versus a sample (statistic).

Objective 3: Distinguish between qualitative and quantitative variables

Variables are characteristics of individuals; they vary across individuals.
Quantitative variables provide numerical measures; their values can be added or subtracted to yield meaningful results (e.g., temperature, volume).
Qualitative (categorical, or attribute) variables classify individuals based on attributes or characteristics (e.g., gender, ZIP code).
Example from study (Elisabeth Kvaavik and colleagues): classify the following variables as qualitative or quantitative:
1. Nationality — qualitative
2. Number of children — qualitative (as listed in the transcript; note: typically this is quantitative discrete)
3. Household income in the previous year — quantitative
4. Level of education — quantitative (per transcript; typically qualitative/ordinal in practice)
5. Daily intake of whole grains (grams/day) — quantitative
Distinguishing between qualitative and quantitative variables (Example 2): the transcript lists a sequence: qualitative, qualitative, quantitative, quantitative, quantitative.

Objective 4: Distinguish between discrete and continuous variables

Discrete variable: a quantitative variable with a finite or countable number of possible values (e.g., counts like 0, 1, 2, 3, …).
Continuous variable: a quantitative variable with an infinite number of possible values that can be measured to any desired level of accuracy (e.g., height, weight, time).
Application to the Elisabeth Kvaavik study (Example 4): classify the following quantitative variables as discrete or continuous:
1. Number of children — discrete
2. Household income in the previous year — continuous
3. Daily intake of whole grains (grams/day) — continuous
Data vs observations:
- The list of observations a variable assumes is called data.
- A variable such as gender is a variable; the observed values (male, female) are data.
Data types:
- Qualitative data correspond to qualitative variables.
- Quantitative data correspond to quantitative variables.
- Discrete data correspond to discrete variables.
- Continuous data correspond to continuous variables.

Objective 5: Determine the level of measurement of a variable

Nominal level: values name or categorize; no inherent order. Examples: yes/no/undecided; colors; jersey numbers; city names; last names. No meaningful ranking.
- Four Levels of Measurement: N (Nominal)
Ordinal level: has nominal properties plus an inherent order or ranking among values. Examples: course grades; small/medium/large; ranks.
- Four Levels of Measurement: O (Ordinal)
Interval level: retains ordinal properties, and differences between values are meaningful; zero does not indicate absence of quantity. Arithmetic operations such as addition and subtraction are meaningful.
- Four Levels of Measurement: I (Interval)
- Examples: temperatures, years.
Ratio level: has interval properties and meaningful ratios; zero indicates absence of quantity. Arithmetic operations such as multiplication and division are meaningful.
- Four Levels of Measurement: R (Ratio)
- Examples: height, weight, prices.
Example 6 (School eating patterns study): Determine the level of measurement for the following variables from a study of vending machines and school policies in 20 U.S. high schools (1088 students):
1. Number of snack and soft drink vending machines in the school — ratio
2. Whether or not the school has a closed campus policy during lunch — nominal
3. Class rank (Freshman, Sophomore, Junior, Senior) — ordinal
4. Number of days per week a student eats school lunch — ratio
Homework reference: Try these examples (Pg. 11/8-44 evens, 48) for practice.

Connections and implications

Foundational principles: variability in data, population vs sample, and the distinction between descriptive and inferential statistics underpin all data analysis workflows.
Real-world relevance: understanding level of measurement and data types guides appropriate statistical methods, visualizations, and interpretations.
Practical considerations: when designing studies, researchers must choose sampling methods, plan data collection to minimize bias, and select summaries that reflect the nature of the data (nominal, ordinal, interval, ratio).
Ethical implications: data collection and interpretation should consider reliability, validity, and the potential for misleading conclusions if inappropriate methods or misclassifications are used.

Quick reference definitions

Population: complete set of all elements under study.
Sample: subset of the population.
Census: data collection from every member of the population.
Parameter: numerical summary of a population.
Statistic: numerical summary of a sample.
Descriptive statistics: summarize data (numerical summaries, tables, graphs).
Inferential statistics: generalize from sample to population and assess reliability.
Qualitative vs quantitative data; discrete vs continuous data.
Levels of measurement: nominal, ordinal, interval, ratio.

Notation reminders (typical in practice)

Population parameter: \theta\u007f (theta)
Sample statistic: \hat{\theta}\u007f (hat theta)
When a sample statistic estimates a population parameter, the statistic is an estimator of the parameter.

Important numerical references from the transcript

NYC walk buttons: 3250 total; 77% do not work.
Sample of executives: n = 877; 45% would not hire someone with a typographic error.
Campus job example: population percentage 84.9%; sample with 250 students yields 86.4% with a job.
Example 6 labeling: ratio, ordinal, ratio, nominal.
Example 4 labeling: discrete, continuous, continuous.
Levels of measurement labels: N, O, I, R.

Summary tips for exam preparation

Be able to define and distinguish: population vs sample, census, parameter vs statistic, descriptive vs inferential statistics.
Be able to classify variables as qualitative vs quantitative, and further as discrete vs continuous when quantitative.
Be able to determine the level of measurement for a given variable (nominal, ordinal, interval, ratio) and justify the classification.
Remember the example contexts (city data, employment data, school data) as templates for thinking about real-world datasets.
Practice identifying how the level of measurement affects permissible operations (e.g., addition/subtraction for interval/ratio; no meaningful order for nominal).