Quick Reference: Stats Basics — Variables, Data Types, Notation
Descriptive vs Inferential Statistics
- Descriptive: describes and summarizes data; examples include graphs, histograms, scatter plots, pie charts; purpose is to organize and communicate findings.
- Inferential: generalizes from a sample to a population using probability to assess uncertainty; moves from the specific to the general.
Population vs Sample; Parameters vs Statistics
- Population: all possible observations; a parameter is a population characteristic (constant).
- Sample: subset of the population; a statistic is a sample-based estimate of a population parameter.
- Notation:
- Population: mean μ, standard deviation σ, etc. (Greek letters)
- Sample: mean xˉ, standard deviation s, sample size n (Roman letters)
- Key idea: parameters are fixed constants (in theory) for the population; statistics are estimates from the sample.
Notation and Core Concepts
- The average of the population: μ
- The average of the sample: xˉ
- Population standard deviation: σ; Sample standard deviation: s
- Sample size: n (population size often N)
- These distinctions matter for calculations and interpretations.
Data Types and Variables
- Variables: the numbers collected that can vary; they describe observations.
- Independent variable vs dependent variable: independent is the condition you manipulate; dependent is the data you measure.
- Discrete vs Continuous:
- Discrete: integers only (no decimals). Examples: number of students.
- Continuous: decimals allowed. Examples: time, blood pressure.
- Implication: variable type guides which graphs and analyses are appropriate.
Qualitative / Categorical Data
- Qualitative (categorical) data fall into categories, not inherently numeric.
- Nominal: categories with no natural order (e.g., car brands: Honda, Toyota, Ford).
- Ordinal: categories with a natural order but unequal intervals (e.g., Freshman, Sophomore, Senior; first/second/third place).
- Note: Averages of nominal/ordinal categories are generally not meaningful; use appropriate summaries.
The Research Process (high-level)
- Steps: background literature, design experiment, collect data, analyze data, interpret and present results.
- Emphasize clear communication of results; this is as important as the analysis itself.
Historical Context Highlights (brief)
- Statistics used in war decisions; probability informs cost-benefit analyses.
- John Snow’s cholera map was an early example of linking data to causal interpretation.
- Early population data (e.g., Domesday Book) illustrate long-standing use of counting for decisions.