Summary Notes on Categorical Data

Categorical Data Summary

Descriptive vs. Inferential Statistics

  • Descriptive Statistics: Methods for organizing and summarizing data.

  • Inferential Statistics: Used to draw conclusions about a population from a sample.

Steps of Analysis

  1. Visualize the Distribution of Sample Data

  2. Describe the Distribution of Sample Data

  3. Measure Characteristics of Sample Distribution

  4. Infer from Sample About the Population

Distribution Representation

  • Types of Distribution:

    • Frequency Distribution: Lists occurrences of each value.

    • Relative Frequency Distribution: Proportion of total observations per category.

  • Graphical Representations:

    • Bar Chart: Displays discrete data; useful for comparison.

    • Pie Chart: Circular graphic; often seen as inefficient for subtle differences.

    • Histograms: Graph for continuous data, similar to bar charts.

    • Dotplots: Visualize frequencies with dots.

    • Stem-and-Leaf Diagrams: Combine histogram and dotplot; keeps all data points.

Organizing Categorical Variables

  • Key methods:

    1. Frequency Distribution

    2. Relative Frequency Distribution

    3. Pie Chart (use with caution for small differences)

    4. Bar Chart (for clearer comparison between categories)

Principles of Good Graphing

  1. Area Principle: Ensure graphic areas accurately represent data.

  2. Fill the Canvas: Efficient use of space for visual comparisons.

  3. Labeling: Clearly label axes and data points for context.

Working with Two Categorical Variables

  • Contingency Tables: Organize data for analysis between two categorical variables.

    • Marginal Distribution: Total counts or proportions without considering the other variable.

    • Conditional Distribution: Distribution of one variable based on the categories of the other.

Misleading Graphs

  • Critical approach needed to avoid misrepresentation; notably, axes should start at zero to prevent distortion.