Constructing Graphical and Tabular Displays of Data
Variable Classification
Categorical Variable: Consists of names or labels identifying groups of individuals.
Examples: Academic major, ZIP code of a home in Arkansas, and variables where numbers represent categories (e.g., for man, for woman).
Numerical Variable: Consists of measurable quantities.
Examples: Age of a hip-hop artist (e.g., Travis Scott at years or Kendrick Lamar at years) and maximum speed of a car (e.g., Toyota Camry at or Porsche 911 at ).
Frequency and Distributions
Frequency: The number of observations in a specific category.
Frequency Table: Lists all categories and their respective frequencies.
Relative Frequency: The proportion of total observations belonging to a category, calculated as:
Sum of Relative Frequencies: For any categorical variable, the sum of relative frequencies across all categories always equals .
Example Calculations (Movie Genres):
Action: Frequency of out of observations leads to a relative frequency of ().
Comedy: Largest relative frequency at ().
Logical Grouping (AND and OR)
AND: Refers to the intersection of criteria. For November 2020, dates that are in the third week AND are Thursdays includes only the date .
OR: Refers to the union of criteria. Dates in the third week OR Thursdays in November 2020 include: , , , , , , , , , and .
Graphical Data Interpretation
Frequency Bar Graph: Displays categories on the horizontal axis and frequencies on the vertical axis with bars representing the count for each category.
Multiple Bar Graph: Used to compare proportions between different groups (e.g., men vs. women).
Political Survey Analysis (1960 adults):
Women identifying as Democrats: proportion.
Men identifying as Independents: (the greatest proportion for men).
Proportion vs. Count: A smaller proportion can represent a larger count if the total group size is larger. For example, a proportion of women () is more individuals than a proportion of men ().
Sampling Error: Findings from a sample (e.g., of men surveyed are Republicans) cannot be generalized as an exact percentage for the entire population (e.g., all American men) due to potential fluctuations in data.