Data Types & Data Collection (Quick Review)
Population and Proportion
- Population of interest: adults; parameter is the true proportion .
- Sample statistic for proportion:
- Distinguish: population parameter vs sample statistic; use proportion for population context (not the mean).
Data Collection Methods
- Data collection types: surveys, experiments, observational studies.
- Survey = observational study.
- Experiment = active manipulation (controls or intervention) to study effects.
- Example concept: future-oriented observational study (e.g., teacher retention over the next five years).
Data Types Overview
- Data categories: Categorical vs Numerical.
- Numerical data splits: Discrete vs Continuous.
- Categorical data splits: Nominal vs Ordinal.
- Numerical data splits: Interval vs Ratio.
- Key distinction:
- Interval: differences meaningful, no true zero.
- Ratio: differences meaningful, true zero exists.
Examples by Type
- Discrete data (countable):
- Number of students in a class; number of marbles in a jar.
- Continuous data (infinite values):
- Height, weight, length (can take decimals).
- Nominal (categorical, no order):
- Yelp/Amazon/Netflix category labels (non-ordered).
- Ordinal (categorical, ordered):
- Course grades (A, B, C, D); educational level (Freshman, Sophomore, Junior, Senior).
- Interval data (numerical, no true zero):
- SAT scores, temperature in Celsius/Fahrenheit (differences meaningful, zero is not absolute).
- Ratio data (numerical, true zero):
- Height, weight, length; number of children; class duration (when zero means none).
- Note on zeros:
- Interval: zero is arbitrary (no true zero).
- Ratio: zero is meaningful (represents none).
- Class duration can be either discrete or continuous depending on measurement, and it can be analyzed as ratio data if zero is meaningful.
Quick Takeaways
- Always classify data first: categorical vs numerical; then refine (nominal/ordinal or interval/ratio).
- For numerical data, identify whether a true zero exists to choose interval vs ratio.
- Surveys are observational; experiments involve active intervention; both produce data of varying types.