Summary Notes on Categorical Data
Categorical Data Summary
Descriptive vs. Inferential Statistics
Descriptive Statistics: Methods for organizing and summarizing data.
Inferential Statistics: Used to draw conclusions about a population from a sample.
Steps of Analysis
Visualize the Distribution of Sample Data
Describe the Distribution of Sample Data
Measure Characteristics of Sample Distribution
Infer from Sample About the Population
Distribution Representation
Types of Distribution:
Frequency Distribution: Lists occurrences of each value.
Relative Frequency Distribution: Proportion of total observations per category.
Graphical Representations:
Bar Chart: Displays discrete data; useful for comparison.
Pie Chart: Circular graphic; often seen as inefficient for subtle differences.
Histograms: Graph for continuous data, similar to bar charts.
Dotplots: Visualize frequencies with dots.
Stem-and-Leaf Diagrams: Combine histogram and dotplot; keeps all data points.
Organizing Categorical Variables
Key methods:
Frequency Distribution
Relative Frequency Distribution
Pie Chart (use with caution for small differences)
Bar Chart (for clearer comparison between categories)
Principles of Good Graphing
Area Principle: Ensure graphic areas accurately represent data.
Fill the Canvas: Efficient use of space for visual comparisons.
Labeling: Clearly label axes and data points for context.
Working with Two Categorical Variables
Contingency Tables: Organize data for analysis between two categorical variables.
Marginal Distribution: Total counts or proportions without considering the other variable.
Conditional Distribution: Distribution of one variable based on the categories of the other.
Misleading Graphs
Critical approach needed to avoid misrepresentation; notably, axes should start at zero to prevent distortion.