3_Overlaid_Graphs.ipynb - Colab

Introduction to Overlaid Graphs

  • Overview: This chapter focuses on the visualization of data using graphs, particularly through the technique of overlaying plots to compare two datasets.

  • Purpose of Overlaid Graphs: The primary advantage of overlaying graphs is to enable a direct visual comparison between two sets of data that share the same variables and measurement units.

Creating Overlaid Graphs

  • Methodology:

    • Graphs can be created using methods such as scatter, plot, and barh. The essential requirement is that one column serves as the common horizontal axis for the graphs being plotted.

    • General structure: The method to call an overlaid graph generally follows this format:

      name_of_table.method(column_label_of_common_axis, array_of_labels_of_variables_to_plot)
    • More commonly, select only necessary columns and call the method with only the variable on the common axis:

      name_of_table.method(column_label_of_common_axis)

Overlaid Scatter Plots

  • Dataset Used: The dataset sons_heights, containing height information of fathers, mothers, and their sons is used to illustrate scatter plots.

  • Implementation Example:

    from datascience import *
    import numpy as np
    %matplotlib inline
    import matplotlib.pyplot as plots
    plots.style.use('fivethirtyeight')
    sons_heights.scatter('GENDER', 'son')
  • Description of the Scatter Plot:

    • In the resultant graph:

      • The horizontal axis represents the sons' heights.

      • Points in blue indicate fathers' heights while those in gold show mothers' heights.

    • Notable Trends: The graphs show a positive association indicated by an upward slope generally, with fathers being taller than mothers.

Overlaid Line Plots

  • Census Data Table: The next example involves children’s population data from the Census to create line plots for age distribution across 2014 and 2019.

  • Dataset Preparation:

    • Load the Census data and select relevant columns to analyze:

    from google.colab import files
    uploaded = files.upload()
    full_census_table = Table.read_table('population_estimates.csv')
    partial_census_table = full_census_table.select('AGE', 'POPESTIMATE2014', 'POPESTIMATE2019')
  • Visualizing Population Dynamics:

    • Two line plots can be drawn with:

    children.plot('AGE')
    • The horizontal axis in the plots contains ages, and despite half-integer values appearing, they represent children from age 0 to 18.In 2019, most 12-year-olds originated from the 7-year-olds in 2014.

  • Interpreting the Graphs:

    • Key Observations: Comparison reveals fluctuations where more 6-year-olds were present in 2014 than in 2019, while trends in 12-year-olds trends reversed, indicating an influx of children in certain age groups over five years.

    • The overall trend includes a slight increase in numbers due to immigration, countering the minor losses from mortality at these ages.

Conclusion

  • Summary of Findings: Overlaid graphs, both scatter and line plots, are valuable tools in data visualization for demonstrating relationships and trends between different datasets. Analyzing how variables interact provides insights into demographic changes over time.

robot