Statistical Thinking – Data Types & Descriptive Statistics (Quick Review)
Introduction
- Focus: Gathering & understanding data; descriptive statistics for quick comparison.
Data Types
- Two broad classes:
- Numerical / Quantitative
- Categorical / Qualitative
Numerical Data
- Quantity that can be counted or measured.
- Discrete
- Countable, separate values (e.g., number of weekly posts 0,1,2,…)
- Continuous
- Any value in a range (e.g., time on social media 1.5hrs)
Categorical Data
- Quality or category labels.
- Nominal
- No inherent order (Instagram / Facebook / Snapchat)
- Ordinal
- Ordered categories without exact spacing (UI rating: Poor < Fair < Good < Excellent)
Quick Recap Table
- Nominal | Ordinal | Discrete | Continuous
Descriptive Statistics — The 3 M’s
- Mean xˉ=n∑<em>i=1nx</em>i
- Median Middle value after sorting.
- Mode Most frequent value/category.
Properties of the Mean
- Add constant c to every data point ⇒ xˉnew=xˉ+c
- Multiply every data point by k ⇒ xˉnew=kxˉ
- Mean sensitive to outliers (e.g., 2,3,4,5,6,20 ⇒ xˉ=6.67 ≠ typical).
- Median preferred when extreme values exist.
- Mode preferred when:
- Data are categorical / non-numeric.
- Interest is in most common choice.
- Distribution highly skewed (income example).
Moving (Window) Average
- Compute mean over a sliding window (e.g., 3-point moving average) to smooth time-series fluctuations.
Choosing the Right Measure
- Symmetric, outlier-free numeric data → Mean.
- Skewed or outlier-heavy numeric data → Median.
- Categorical or desire for most common value → Mode.
Key Takeaways
- Distinguish data type first; it guides valid statistics.
- Understand how transformations affect mean.
- Be aware of outliers before selecting a measure of center.
- Moving averages help reveal trends in sequential data.