Section 2.1 Categorical Variables
SECTION 2.1 CATEGORICAL VARIABLES
One Categorical Variable
Summary Statistics:
Frequency Table
Proportion
Visualization:
Bar Chart
Pie Chart
Two Categorical Variables
Summary Statistics:
Two-Way Table
Difference in Proportions
Visualization:
Segmented Bar Chart
Side-by-Side Bar Chart
Observational Data and Generalization
Causality:
Causality cannot be concluded from observational data collected from surveys.
Generalization:
Sample may not represent the larger population due to lack of randomness.
Summarizing Data
Need methods to summarize and visualize data for better understanding.
Type of Summary Statistics and Visualization Methods:
Depends on type of variable (categorical or quantitative).
Descriptive Statistics:
Also known as exploratory data analysis, targets summarizing and visualizing variables and their relationships.
Summarizing Data for One Categorical Variable
Example: Cell Phone Ownership Survey (2012)
Categories: Android, iPhone, Blackberry, Non-smartphone, No cell phone.
Frequency Table
Displays survey results in counts per category:
Android: 458
iPhone: 437
Blackberry: 141
Non-smartphone: 924
No cell phone: 293
Total: 2253
Proportion Calculation
Proportion found using formula:
Formula: p = (number in category) / (total sample size)
Notation:
Proportion for a Sample: (p-hat) p̂
Proportion for a Population: (p)
Example Calculation of Proportion
Non Smartphone Owners:
p̂ = 293 / 2253 = 0.13
Equivalent Percentage: 13%
Relative Frequency Table
Shows proportion of cases in each category:
Android: 0.203
iPhone: 0.194
Blackberry: 0.063
Non Smartphone: 0.410
No Cell Phone: 0.130
Total of Table: Sums to 1 or 100%.
Examples of Proportions with Notation
Problem a: For community college statistics, proportion of students preferring UCLA:
Total: 45, Preferred: 19
Answer: (p = \frac{19}{45} \approx 0.42)
Problem b: For apples, with 2 out of 11 being rotten:
Proportion: (\hat{p} = \frac{2}{11} \approx 0.18)
Relative Frequency Construction
Water Taste Preference Example:
Categories: Tap, Aquafina, Fiji, Sam's Choice
Total surveyed: 100
Relative Frequencies calculated as follows:
Tap: 0.10
Aquafina: 0.25
Fiji: 0.41
Sam's Choice: 0.24
Visualizing Data for One Categorical Variable
Barplot:
Bar height corresponds to counts of cases in each category.
Pie Chart:
Relative area of each slice represents proportions per category.
Summarizing Data for Two Categorical Variables
Student Relationship Example: Surveyed university students on gender and relationship status.
Two-Way Table
For relationship and gender:
Female
Male
Total
In a Relationship
32
10
42
It’s Complicated
12
7
19
Single
63
45
108
Total
107
62
169
Proportion Calculations
Proportions can be derived for categories using totals from the two-way table:
Sample Proportions:
Students in a relationship: [ \frac{42}{169} \approx 0.248 ]
Female students: [ \frac{107}{169} \approx 0.634 ]
Students in relationship that are female: [ \frac{32}{42} \approx 0.762 ]
Caution on Proportion Interpretation
A distinction between proportions of categories is critical:
The proportion of females in relationships differs from that of relationships being female.
Difference in Proportions Concept
Difference between proportions can show relationships between two categorical variables.
Example Calculations for Differences
For singles by gender:
Female proportion: ( \frac{63}{107} \approx 0.589 )
Male proportion: ( \frac{45}{62} \approx 0.726 )
Difference: (0.589 - 0.726 = -0.137)
Example 5: Comparing Proportions of Female Students
Calculate:
( \hat{p}_1 = \frac{32}{42} \approx 0.762 ) (in relationship)
( \hat{p}_2 = \frac{63}{108} \approx 0.583 ) (single students)
Difference: (0.762 - 0.583 = 0.179)
Visualizing Two Categorical Variables
Side by Side Bar Chart:
Provides comparative visuals for two categories
Segmented Bar Chart:
Displays stacked, relative proportions.
Example 6: Therapy and Sleep Improvement Study
Two-Way Table Creation:
Groups: Therapy vs. No Treatment
Proportion Calculations:
Therapy Group Improvements: ( \hat{p} ' = \frac{14}{20} = 0.7 )
No Treatment Improvements: ( \hat{p} ( = \frac{3}{20} = 0.15 )
Difference: ( \hat{p} ' - \hat{p} ( = 0.7 - 0.15 = 0.55)