Notes: Introduction to the Practice of Statistics (1.1)

Objective 1: Define statistics and statistical thinking

  • Data are observations collected from individuals or objects (measurements, genders, survey responses). Data vary across individuals; variability is common (e.g., heights, hair color, sleep hours, daily calories).
  • Statistics is a collection of methods for planning studies and experiments, obtaining data, and then organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions from the data. It also involves providing a measure of confidence in conclusions.
  • Statistical thinking involves understanding variability, planning studies to reduce bias, and using data to support conclusions with quantifiable uncertainty.

Objective 2: Explain the process of statistics

  • Population: the complete collection of all elements to be studied (scores, people, measurements, etc.). The population is the entire group of interest.
  • Individual: a member of the population being studied.
  • Sample: a subcollection of individuals selected from the population.
  • Census: a list of all individuals in a population along with characteristics of each individual.
  • Descriptive statistics: organizing and summarizing data using numerical summaries, tables, and graphs.
  • Inferential statistics: methods that take results from a sample and generalize them to the population, including measures of reliability (uncertainty).
  • Parameter vs. statistic:
    • A parameter is a numerical summary of a population. Often denoted by ? (theta) in notation.
    • A statistic is a numerical summary based on a sample. Often denoted by hat? (hat theta).
    • Example relationships:
    • In NYC, 3250 walk buttons exist at intersections; 77% do not work. {
      Parameter vs. Statistic: the 77% could be a sample statistic or population parameter depending on whether it refers to the whole city (parameter) or a sample (statistic).
      }
    • Example 1 (illustrative):
    • Based on a population example: 3250 buttons, 77% non-working.
    • Based on a sample: from 877 surveyed executives, 45% would not hire someone with a typographic error.
    • Suppose the campus job percentage is 84.9% (population parameter). If a sample of 250 students yields 86.4% with jobs, that sample result is a statistic.
  • Practical note: Distinguish when a figure refers to the full population (parameter) versus a sample (statistic).

Objective 3: Distinguish between qualitative and quantitative variables

  • Variables are characteristics of individuals; they vary across individuals.
  • Quantitative variables provide numerical measures; their values can be added or subtracted to yield meaningful results (e.g., temperature, volume).
  • Qualitative (categorical, or attribute) variables classify individuals based on attributes or characteristics (e.g., gender, ZIP code).
  • Example from study (Elisabeth Kvaavik and colleagues): classify the following variables as qualitative or quantitative:
    1. Nationality — qualitative
    2. Number of children — qualitative (as listed in the transcript; note: typically this is quantitative discrete)
    3. Household income in the previous year — quantitative
    4. Level of education — quantitative (per transcript; typically qualitative/ordinal in practice)
    5. Daily intake of whole grains (grams/day) — quantitative
  • Distinguishing between qualitative and quantitative variables (Example 2): the transcript lists a sequence: qualitative, qualitative, quantitative, quantitative, quantitative.

Objective 4: Distinguish between discrete and continuous variables

  • Discrete variable: a quantitative variable with a finite or countable number of possible values (e.g., counts like 0, 1, 2, 3, …).
  • Continuous variable: a quantitative variable with an infinite number of possible values that can be measured to any desired level of accuracy (e.g., height, weight, time).
  • Application to the Elisabeth Kvaavik study (Example 4): classify the following quantitative variables as discrete or continuous:
    1. Number of children — discrete
    2. Household income in the previous year — continuous
    3. Daily intake of whole grains (grams/day) — continuous
  • Data vs observations:
    • The list of observations a variable assumes is called data.
    • A variable such as gender is a variable; the observed values (male, female) are data.
  • Data types:
    • Qualitative data correspond to qualitative variables.
    • Quantitative data correspond to quantitative variables.
    • Discrete data correspond to discrete variables.
    • Continuous data correspond to continuous variables.

Objective 5: Determine the level of measurement of a variable

  • Nominal level: values name or categorize; no inherent order. Examples: yes/no/undecided; colors; jersey numbers; city names; last names. No meaningful ranking.
    • Four Levels of Measurement: N (Nominal)
  • Ordinal level: has nominal properties plus an inherent order or ranking among values. Examples: course grades; small/medium/large; ranks.
    • Four Levels of Measurement: O (Ordinal)
  • Interval level: retains ordinal properties, and differences between values are meaningful; zero does not indicate absence of quantity. Arithmetic operations such as addition and subtraction are meaningful.
    • Four Levels of Measurement: I (Interval)
    • Examples: temperatures, years.
  • Ratio level: has interval properties and meaningful ratios; zero indicates absence of quantity. Arithmetic operations such as multiplication and division are meaningful.
    • Four Levels of Measurement: R (Ratio)
    • Examples: height, weight, prices.
  • Example 6 (School eating patterns study): Determine the level of measurement for the following variables from a study of vending machines and school policies in 20 U.S. high schools (1088 students):
    1. Number of snack and soft drink vending machines in the school — ratio
    2. Whether or not the school has a closed campus policy during lunch — nominal
    3. Class rank (Freshman, Sophomore, Junior, Senior) — ordinal
    4. Number of days per week a student eats school lunch — ratio
  • Homework reference: Try these examples (Pg. 11/8-44 evens, 48) for practice.

Connections and implications

  • Foundational principles: variability in data, population vs sample, and the distinction between descriptive and inferential statistics underpin all data analysis workflows.
  • Real-world relevance: understanding level of measurement and data types guides appropriate statistical methods, visualizations, and interpretations.
  • Practical considerations: when designing studies, researchers must choose sampling methods, plan data collection to minimize bias, and select summaries that reflect the nature of the data (nominal, ordinal, interval, ratio).
  • Ethical implications: data collection and interpretation should consider reliability, validity, and the potential for misleading conclusions if inappropriate methods or misclassifications are used.

Quick reference definitions

  • Population: complete set of all elements under study.
  • Sample: subset of the population.
  • Census: data collection from every member of the population.
  • Parameter: numerical summary of a population.
  • Statistic: numerical summary of a sample.
  • Descriptive statistics: summarize data (numerical summaries, tables, graphs).
  • Inferential statistics: generalize from sample to population and assess reliability.
  • Qualitative vs quantitative data; discrete vs continuous data.
  • Levels of measurement: nominal, ordinal, interval, ratio.

Notation reminders (typical in practice)

  • Population parameter: \theta\u007f (theta)
  • Sample statistic: \hat{\theta}\u007f (hat theta)
  • When a sample statistic estimates a population parameter, the statistic is an estimator of the parameter.

Important numerical references from the transcript

  • NYC walk buttons: 3250 total; 77% do not work.
  • Sample of executives: n = 877; 45% would not hire someone with a typographic error.
  • Campus job example: population percentage 84.9%; sample with 250 students yields 86.4% with a job.
  • Example 6 labeling: ratio, ordinal, ratio, nominal.
  • Example 4 labeling: discrete, continuous, continuous.
  • Levels of measurement labels: N, O, I, R.

Summary tips for exam preparation

  • Be able to define and distinguish: population vs sample, census, parameter vs statistic, descriptive vs inferential statistics.
  • Be able to classify variables as qualitative vs quantitative, and further as discrete vs continuous when quantitative.
  • Be able to determine the level of measurement for a given variable (nominal, ordinal, interval, ratio) and justify the classification.
  • Remember the example contexts (city data, employment data, school data) as templates for thinking about real-world datasets.
  • Practice identifying how the level of measurement affects permissible operations (e.g., addition/subtraction for interval/ratio; no meaningful order for nominal).