Histograms, Skewness, and Data Interpretation

Histograms show how often values appear in data using bars representing counts within number ranges (called bins or classes).
A bin is a number range for grouping data (e.g., 5-10, 10-15).
Bin width is the size of each range, usually the same for all bins.
Dot plots are better for small datasets or counted/category data. Histograms are for continuous data (like height, temperature) and show counts in ranges.
Bar plots are for categories (words, places) and have gaps between bars. Histograms are for numerical data and have touching bars.
Histograms help reveal the data's distribution: its shape, center, spread, and outliers.

Histograms make the data's shape clear, unlike raw numbers.
Symmetry vs. skewness:
- Symmetric distributions look similar on both sides of the center (like a mirror).
- Skewness means the data has a long "tail" on one side:
  - Left-skewed (tail to the left): More data on the right, but a long, thin spread to the left.
  - Right-skewed (tail to the right): More data on the left, but a long, thin spread to the right.
- A normal distribution is a perfectly symmetric, bell-shaped curve.
Multimodal distributions have more than one peak, often from combining different groups (e.g., heights of men and women).
You usually judge symmetry and skewness visually; it's a skill developed over time.

Retirement age: Often described as left-skewed (more people retire later, tail to the left).
Income distribution: Usually right-skewed (a few people earn very high incomes, pulling the tail to the right; most earn lower incomes).

There are no strict numerical rules for what makes a distribution "good" or "bad"; judgments are visual and depend on the context.
Focus on discussing symmetry, skewness, and modality (one peak vs. many).

$n$ denotes sample size (the number of data points).
A bin (group) with class width $w$ and first lower bound $L<em>0$ is approximately $[L</em>0 + i w, \, L_0 + (i+1) w)$ . You usually report specific ranges like 25–30.

Histogram bars touch (continuous data); bar plot bars have gaps (category data).
Read the distribution's shape:
- Center (where most data is)
- Spread (how wide the data is)
- Skewness (where the tail is)
- Modality (how many peaks)

Histograms summarize data with frequency counts in intervals, linking to frequency distributions.
Understanding skewness helps estimate where the mean (average) and median (middle value) might be compared to each other.

Gaussian (Normal) distribution density: $f(x) = \frac{1}{\sigma \sqrt{2\pi}} \, e^{-\frac{(x-\mu)^2}{2\sigma^2}}$
Skewness direction:
- Left-skewed: Tail to the left; bulk of data on the right.
- Right-skewed: Tail to the right; bulk of data on the left.

Histograms visualize continuous data distributions using fixed-width bins and touching bars.
Dot plots work for smaller/discrete data; bar plots for categories.
Skewness and symmetry are core descriptive concepts: left-skewed, right-skewed, and symmetric.
These concepts guide data interpretation and future analyses.