Chapter 1 Notes on Picturing Distributions with Graphs

Understanding Statistics

  • Statistics: The science of data.

    • First step is organizing thoughts about the data.

Key Terminology

  • Individual: An object described by a set of data.

  • Variable: A characteristic of the individual.

Planning a Study

  • Key Questions:

    • Who?: What individuals does the data describe?

    • What?: Definitions and measurements of the variables.

    • Where?: The context of data collection matters.

    • When?: Timing of the data collection.

    • Why?: Purpose of data collection - is it to describe individuals or a larger group?

Types of Variables

  • Categorical Variable: Assigns individuals to groups or categories.

  • Quantitative Variable: Takes numerical values where arithmetic operations make sense, often measured in specific units.

Exploratory Data Analysis (EDA)

  • EDA Process: Use statistical tools to examine data to describe features. Steps include:

    • Examining each variable by itself.

    • Studying relationships among variables.

    • Starting with graphs, then providing numerical summaries.

Distribution of Variables

  • Categorical Variability: Lists categories with counts or percents of individuals per category.

    • Pie Charts: Visual display of categorical distributions showing proportionate slices.

    • Bar Graphs: Represent categorical data through bars showing counts/percents for each category.

Examples of Categorical Data

  • Pie Chart Example: Full-time first-year students' intended majors, showing percent distribution across categories (e.g., Biological sciences, Business).

  • Bar Graph Example: Sources used by Americans aged 12-34 for music, depicting frequency of use for each platform.

Quantitative Data

  • Distribution Overview: Quantitative variables show values and their frequency.

    • Histograms: Show distributions with bars; height indicates count of observations in intervals.

    • Stemplots: Display data by separating observations into stems and leaves while retaining actual values.

Histograms

  • Creating Histograms:

    • Use for quantitative variables, especially in large datasets.

    • Divide data into equal-width classes, count observations in each, and display as bars.

  • Interpretation:

    • Assess shape, center, variability, and identify outliers in the histogram.

Shape of Distributions

  • Types of Shapes:

    • Symmetric: Mirror-like left and right sides.

    • Right-skewed: Longer right tail; higher frequencies of lower values.

    • Left-skewed: Longer left tail; higher frequencies of higher values.

Stemplots (Stem-and-Leaf Plots)

  • Creating Stemplots:

    • Divide each observation into stem (all but the last digit) and leaf (the last digit).

    • Display stems in a vertical column with leaves listed in order to the right.

Time Plots

  • Purpose: Show data trends over time.

  • Structure: Time on the horizontal axis, measured variable on the vertical.

    • Look for trends and seasonal patterns in the data.

  • Examples:

    • Time Plot of Water Levels: Showing mean daily gauge height over years.

    • CO2 Concentration Plot: Patterns of atmospheric CO2 concentration over decades.


These organized notes serve to outline the foundational aspects of statistical analysis, particularly focusing on variables, data representation methods, and the exploration of data distributions.