Histogram Explained

Visualizing Data Distribution: Histograms

Introduction

  • The goal is to visualize the distribution of ages in a restaurant to understand the demographic makeup (young, teenage, middle-aged, senior).
  • Raw numbers don't provide a clear sense of the distribution.

Buckets/Bins

  • A method to organize data is to group ages into buckets or bins.
  • Count the number of people in each bucket.
Example: Age Buckets
  • Buckets are defined in ten-year ranges:
    • 0-9
    • 10-19
    • 20-29
    • 30-39
    • 40-49
    • 50-59
    • 60-69
Counting People in Each Bucket
  • 0-9: 6 people
  • 10-19: 3 people
  • 20-29: 5 people
  • 30-39: 1 person
  • 40-49: 2 people
  • 50-59: 2 people
  • 60-69: 1 person

Histograms

  • A histogram is a visualization that uses the data, puts them into categories, and then plots how many folks are in each category.
Creating the Histogram
  • X-axis: Buckets (age ranges)
  • Y-axis: Number of people (frequency)
Plotting the Data
  • 0-9: Bar extends to 6 on the number of folks axis.
  • 10-19: Bar extends to 3.
  • 20-29: Bar extends to 5.
  • 30-39: Bar extends to 1.
  • 40-49: Bar extends to 2.
  • 50-59: Bar extends to 2.
  • 60-69: Bar extends to 1.
Interpreting the Histogram
  • The histogram provides a visual sense of the age distribution in the restaurant.
  • Example: A restaurant that gives away toys might have more younger people.
  • It helps to see trends like many young adults with kids or grandparents bringing children.
  • In this instance, the restaurant appears to have a lot of kids but few senior citizens.

General Applicability

  • Histograms can be applied to various types of data, not just ages, to visualize distributions.
Comparison with Dot Plots
  • Unlike dot plots, which plot individual data points, histograms group data into buckets.
  • If using a dot plot, there won't be much information.
  • Histograms are useful when individual data points don't provide much information on their own.
  • Histograms provide a summary of how many values fall within a given range.