Key Concepts: Descriptive Statistics, Population/Sample, and Variables

Data Summary and Inference

Descriptive vs. inferential statistics
- Descriptive statistics: summarize data (graphs, numerical summaries) from a sample to describe the data you have.
- Inferential statistics: use sample data to make inferences about a population.
Population vs. sample
- Population: the whole group you want to learn about.
- Sample: a subgroup drawn from the population.
- Parameter: a numerical value describing the population (e.g., population mean
  \mu).
- Statistic: a numerical value describing the sample (e.g., sample mean $\bar{x}$ ).
Population and sample relationship
- We often study a sample to infer about the population.
- Never confuse the two: a statistic characterizes the sample; a parameter characterizes the population.
Variables and data collection
- Variable: a property measured on each unit (e.g., hours of sleep, number of classes).
- Data types:
- Qualitative (categorical): categories (e.g., gender, lunch place).
- Quantitative (numerical): numbers (e.g., hours slept, number of classes).
  - Discrete: only whole numbers (e.g., number of classes).
  - Continuous: any value on a continuum (e.g., height, weight).
- Example variable names: grade, hours of sleep, number of classes, GPA.
Notation to know
- $n$ = sample size.
- $\bar{x}$ = sample mean.
- $\mu$ = population mean (parameter).
- A “parameter” describes the population; a “statistic” describes the sample.
- Relative frequency = $\text{frequency} / n$ ; equivalently, a proportion.
- Sum of all relative frequencies equals $1$ :
  $\sum \text{relative frequency} = 1$ .
Data analysis process (high level)
- Step 1: identify the research question.
- Step 2–3: decide what to measure and how; define the variable(s).
- Step 4: data summarization and preliminary analysis (descriptive statistics).
- Step 5–6: inference via hypothesis testing/estimation and communicating results.
Descriptive statistics tools
- Frequency distribution (table): counts of categories (qualitative data).
- Relative frequency (and proportion): frequency / $n$ ; used to compare groups.
- Examples:
- Qualitative dataset with two categories: freshman (3) and senior (1) ⇒ n = 4.
  - Relative frequencies: $\frac{3}{4}=0.75$ for freshmen, $\frac{1}{4}=0.25$ for seniors.
  - Sum: $0.75+0.25=1$ .
Graphs for qualitative data
- Bar chart and pie chart both display qualitative distributions; choice depends on preference.
- Comparative bar chart: compare distributions across two qualitative variables (e.g., gender by lunch location).
Graph interpretation and data reading
- A graph conveys the same information as the frequency distribution but in visual form.
- From a single-variable graph, you can deduce the sample size by summing frequencies (e.g., 3 + 2 + 1 = 6).
Practical reminders
- In assignments, you’ll work with the sample data to answer questions about the population.
- Textbook and in-class graphs are sources for examples; don’t rely on class-only material for every graph.
Quick example recap
- Dataset: qualitative variable with two categories (lunch location: JPL, off campus, brought own).
- Frequencies: JPL = 3, Off campus = 2, Brought own = 1; n = 6; relative frequencies: $\frac{3}{6}=0.50$ , $\frac{2}{6}=0.33\overline{3}$ , $\frac{1}{6}=0.166\overline{6}$ .
- Pie chart/bar chart can display these; proportions equal relative frequencies.