Quick Reference: Stats Basics — Variables, Data Types, Notation

Descriptive vs Inferential Statistics

  • Descriptive: describes and summarizes data; examples include graphs, histograms, scatter plots, pie charts; purpose is to organize and communicate findings.
  • Inferential: generalizes from a sample to a population using probability to assess uncertainty; moves from the specific to the general.

Population vs Sample; Parameters vs Statistics

  • Population: all possible observations; a parameter is a population characteristic (constant).
  • Sample: subset of the population; a statistic is a sample-based estimate of a population parameter.
  • Notation:
    • Population: mean μ\mu, standard deviation σ\sigma, etc. (Greek letters)
    • Sample: mean xˉ\bar{x}, standard deviation ss, sample size nn (Roman letters)
  • Key idea: parameters are fixed constants (in theory) for the population; statistics are estimates from the sample.

Notation and Core Concepts

  • The average of the population: μ\mu
  • The average of the sample: xˉ\bar{x}
  • Population standard deviation: σ\sigma; Sample standard deviation: ss
  • Sample size: nn (population size often NN)
  • These distinctions matter for calculations and interpretations.

Data Types and Variables

  • Variables: the numbers collected that can vary; they describe observations.
  • Independent variable vs dependent variable: independent is the condition you manipulate; dependent is the data you measure.
  • Discrete vs Continuous:
    • Discrete: integers only (no decimals). Examples: number of students.
    • Continuous: decimals allowed. Examples: time, blood pressure.
  • Implication: variable type guides which graphs and analyses are appropriate.

Qualitative / Categorical Data

  • Qualitative (categorical) data fall into categories, not inherently numeric.
  • Nominal: categories with no natural order (e.g., car brands: Honda, Toyota, Ford).
  • Ordinal: categories with a natural order but unequal intervals (e.g., Freshman, Sophomore, Senior; first/second/third place).
  • Note: Averages of nominal/ordinal categories are generally not meaningful; use appropriate summaries.

The Research Process (high-level)

  • Steps: background literature, design experiment, collect data, analyze data, interpret and present results.
  • Emphasize clear communication of results; this is as important as the analysis itself.

Historical Context Highlights (brief)

  • Statistics used in war decisions; probability informs cost-benefit analyses.
  • John Snow’s cholera map was an early example of linking data to causal interpretation.
  • Early population data (e.g., Domesday Book) illustrate long-standing use of counting for decisions.