Data Types & Data Collection (Quick Review)

Population and Proportion

  • Population of interest: adults; parameter is the true proportion pp.
  • Sample statistic for proportion: p^=xn\hat{p} = \dfrac{x}{n}
  • Distinguish: population parameter vs sample statistic; use proportion for population context (not the mean).

Data Collection Methods

  • Data collection types: surveys, experiments, observational studies.
  • Survey = observational study.
  • Experiment = active manipulation (controls or intervention) to study effects.
  • Example concept: future-oriented observational study (e.g., teacher retention over the next five years).

Data Types Overview

  • Data categories: Categorical vs Numerical.
  • Numerical data splits: Discrete vs Continuous.
  • Categorical data splits: Nominal vs Ordinal.
  • Numerical data splits: Interval vs Ratio.
  • Key distinction:
    • Interval: differences meaningful, no true zero.
    • Ratio: differences meaningful, true zero exists.

Examples by Type

  • Discrete data (countable):
    • Number of students in a class; number of marbles in a jar.
  • Continuous data (infinite values):
    • Height, weight, length (can take decimals).
  • Nominal (categorical, no order):
    • Yelp/Amazon/Netflix category labels (non-ordered).
  • Ordinal (categorical, ordered):
    • Course grades (A, B, C, D); educational level (Freshman, Sophomore, Junior, Senior).
  • Interval data (numerical, no true zero):
    • SAT scores, temperature in Celsius/Fahrenheit (differences meaningful, zero is not absolute).
  • Ratio data (numerical, true zero):
    • Height, weight, length; number of children; class duration (when zero means none).
  • Note on zeros:
    • Interval: zero is arbitrary (no true zero).
    • Ratio: zero is meaningful (represents none).
  • Class duration can be either discrete or continuous depending on measurement, and it can be analyzed as ratio data if zero is meaningful.

Quick Takeaways

  • Always classify data first: categorical vs numerical; then refine (nominal/ordinal or interval/ratio).
  • For numerical data, identify whether a true zero exists to choose interval vs ratio.
  • Surveys are observational; experiments involve active intervention; both produce data of varying types.