Key Concepts: Descriptive Statistics, Population/Sample, and Variables

Data Summary and Inference

  • Descriptive vs. inferential statistics

    • Descriptive statistics: summarize data (graphs, numerical summaries) from a sample to describe the data you have.
    • Inferential statistics: use sample data to make inferences about a population.
  • Population vs. sample

    • Population: the whole group you want to learn about.
    • Sample: a subgroup drawn from the population.
    • Parameter: a numerical value describing the population (e.g., population mean
      \mu).
    • Statistic: a numerical value describing the sample (e.g., sample mean \bar{x}).
  • Population and sample relationship

    • We often study a sample to infer about the population.
    • Never confuse the two: a statistic characterizes the sample; a parameter characterizes the population.
  • Variables and data collection

    • Variable: a property measured on each unit (e.g., hours of sleep, number of classes).
    • Data types:
    • Qualitative (categorical): categories (e.g., gender, lunch place).
    • Quantitative (numerical): numbers (e.g., hours slept, number of classes).
      • Discrete: only whole numbers (e.g., number of classes).
      • Continuous: any value on a continuum (e.g., height, weight).
    • Example variable names: grade, hours of sleep, number of classes, GPA.
  • Notation to know

    • n = sample size.
    • \bar{x} = sample mean.
    • \mu = population mean (parameter).
    • A “parameter” describes the population; a “statistic” describes the sample.
    • Relative frequency = \text{frequency} / n; equivalently, a proportion.
    • Sum of all relative frequencies equals 1:
      \sum \text{relative frequency} = 1.
  • Data analysis process (high level)

    • Step 1: identify the research question.
    • Step 2–3: decide what to measure and how; define the variable(s).
    • Step 4: data summarization and preliminary analysis (descriptive statistics).
    • Step 5–6: inference via hypothesis testing/estimation and communicating results.
  • Descriptive statistics tools

    • Frequency distribution (table): counts of categories (qualitative data).
    • Relative frequency (and proportion): frequency / n; used to compare groups.
    • Examples:
    • Qualitative dataset with two categories: freshman (3) and senior (1) ⇒ n = 4.
      • Relative frequencies: \frac{3}{4}=0.75 for freshmen, \frac{1}{4}=0.25 for seniors.
      • Sum: 0.75+0.25=1.
  • Graphs for qualitative data

    • Bar chart and pie chart both display qualitative distributions; choice depends on preference.
    • Comparative bar chart: compare distributions across two qualitative variables (e.g., gender by lunch location).
  • Graph interpretation and data reading

    • A graph conveys the same information as the frequency distribution but in visual form.
    • From a single-variable graph, you can deduce the sample size by summing frequencies (e.g., 3 + 2 + 1 = 6).
  • Practical reminders

    • In assignments, you’ll work with the sample data to answer questions about the population.
    • Textbook and in-class graphs are sources for examples; don’t rely on class-only material for every graph.
  • Quick example recap

    • Dataset: qualitative variable with two categories (lunch location: JPL, off campus, brought own).
    • Frequencies: JPL = 3, Off campus = 2, Brought own = 1; n = 6; relative frequencies: \frac{3}{6}=0.50, \frac{2}{6}=0.33\overline{3}, \frac{1}{6}=0.166\overline{6}.
    • Pie chart/bar chart can display these; proportions equal relative frequencies.