Key Concepts: Descriptive Statistics, Population/Sample, and Variables
Data Summary and Inference
Descriptive vs. inferential statistics
- Descriptive statistics: summarize data (graphs, numerical summaries) from a sample to describe the data you have.
- Inferential statistics: use sample data to make inferences about a population.
Population vs. sample
- Population: the whole group you want to learn about.
- Sample: a subgroup drawn from the population.
- Parameter: a numerical value describing the population (e.g., population mean
\mu). - Statistic: a numerical value describing the sample (e.g., sample mean \bar{x}).
Population and sample relationship
- We often study a sample to infer about the population.
- Never confuse the two: a statistic characterizes the sample; a parameter characterizes the population.
Variables and data collection
- Variable: a property measured on each unit (e.g., hours of sleep, number of classes).
- Data types:
- Qualitative (categorical): categories (e.g., gender, lunch place).
- Quantitative (numerical): numbers (e.g., hours slept, number of classes).
- Discrete: only whole numbers (e.g., number of classes).
- Continuous: any value on a continuum (e.g., height, weight).
- Example variable names: grade, hours of sleep, number of classes, GPA.
Notation to know
- n = sample size.
- \bar{x} = sample mean.
- \mu = population mean (parameter).
- A “parameter” describes the population; a “statistic” describes the sample.
- Relative frequency = \text{frequency} / n; equivalently, a proportion.
- Sum of all relative frequencies equals 1:
\sum \text{relative frequency} = 1.
Data analysis process (high level)
- Step 1: identify the research question.
- Step 2–3: decide what to measure and how; define the variable(s).
- Step 4: data summarization and preliminary analysis (descriptive statistics).
- Step 5–6: inference via hypothesis testing/estimation and communicating results.
Descriptive statistics tools
- Frequency distribution (table): counts of categories (qualitative data).
- Relative frequency (and proportion): frequency / n; used to compare groups.
- Examples:
- Qualitative dataset with two categories: freshman (3) and senior (1) ⇒ n = 4.
- Relative frequencies: \frac{3}{4}=0.75 for freshmen, \frac{1}{4}=0.25 for seniors.
- Sum: 0.75+0.25=1.
Graphs for qualitative data
- Bar chart and pie chart both display qualitative distributions; choice depends on preference.
- Comparative bar chart: compare distributions across two qualitative variables (e.g., gender by lunch location).
Graph interpretation and data reading
- A graph conveys the same information as the frequency distribution but in visual form.
- From a single-variable graph, you can deduce the sample size by summing frequencies (e.g., 3 + 2 + 1 = 6).
Practical reminders
- In assignments, you’ll work with the sample data to answer questions about the population.
- Textbook and in-class graphs are sources for examples; don’t rely on class-only material for every graph.
Quick example recap
- Dataset: qualitative variable with two categories (lunch location: JPL, off campus, brought own).
- Frequencies: JPL = 3, Off campus = 2, Brought own = 1; n = 6; relative frequencies: \frac{3}{6}=0.50, \frac{2}{6}=0.33\overline{3}, \frac{1}{6}=0.166\overline{6}.
- Pie chart/bar chart can display these; proportions equal relative frequencies.