Statistical Thinking – Data Types & Descriptive Statistics (Quick Review)

Introduction

Focus: Gathering & understanding data; descriptive statistics for quick comparison.

Data Types

Two broad classes:
- Numerical / Quantitative
- Categorical / Qualitative

Numerical Data

Quantity that can be counted or measured.
- Discrete
- Countable, separate values (e.g., number of weekly posts ${0,1,2,\dots}$ )
- Continuous
- Any value in a range (e.g., time on social media $1.5\,\text{hrs}$ )

Categorical Data

Quality or category labels.
- Nominal
- No inherent order (Instagram / Facebook / Snapchat)
- Ordinal
- Ordered categories without exact spacing (UI rating: Poor < Fair < Good < Excellent)

Quick Recap Table

Nominal | Ordinal | Discrete | Continuous

Descriptive Statistics — The 3 M’s

Mean $\bar{x}=\frac{\sum<em>{i=1}^{n}x</em>i}{n}$
Median Middle value after sorting.
Mode Most frequent value/category.

Properties of the Mean

Add constant $c$ to every data point ⇒ $\bar{x}_{\text{new}} = \bar{x}+c$
Multiply every data point by $k$ ⇒ $\bar{x}_{\text{new}} = k\,\bar{x}$

Mean vs. Median vs. Mode

Mean sensitive to outliers (e.g., ${2,3,4,5,6,20}$ ⇒ $\bar{x}=6.67$ ≠ typical).
Median preferred when extreme values exist.
Mode preferred when:
- Data are categorical / non-numeric.
- Interest is in most common choice.
- Distribution highly skewed (income example).

Moving (Window) Average

Compute mean over a sliding window (e.g., 3-point moving average) to smooth time-series fluctuations.

Choosing the Right Measure

Symmetric, outlier-free numeric data → Mean.
Skewed or outlier-heavy numeric data → Median.
Categorical or desire for most common value → Mode.

Key Takeaways

Distinguish data type first; it guides valid statistics.
Understand how transformations affect mean.
Be aware of outliers before selecting a measure of center.
Moving averages help reveal trends in sequential data.