Section 2.1 Categorical Variables

SECTION 2.1 CATEGORICAL VARIABLES

One Categorical Variable

  • Summary Statistics:

    • Frequency Table

    • Proportion

  • Visualization:

    • Bar Chart

    • Pie Chart

Two Categorical Variables

  • Summary Statistics:

    • Two-Way Table

    • Difference in Proportions

  • Visualization:

    • Segmented Bar Chart

    • Side-by-Side Bar Chart

Observational Data and Generalization

  • Causality:

    • Causality cannot be concluded from observational data collected from surveys.

  • Generalization:

    • Sample may not represent the larger population due to lack of randomness.

Summarizing Data

  • Need methods to summarize and visualize data for better understanding.

  • Type of Summary Statistics and Visualization Methods:

    • Depends on type of variable (categorical or quantitative).

  • Descriptive Statistics:

    • Also known as exploratory data analysis, targets summarizing and visualizing variables and their relationships.

Summarizing Data for One Categorical Variable

  • Example: Cell Phone Ownership Survey (2012)

    • Categories: Android, iPhone, Blackberry, Non-smartphone, No cell phone.

Frequency Table

  • Displays survey results in counts per category:

    • Android: 458

    • iPhone: 437

    • Blackberry: 141

    • Non-smartphone: 924

    • No cell phone: 293

    • Total: 2253

Proportion Calculation

  • Proportion found using formula:

    • Formula: p = (number in category) / (total sample size)

  • Notation:

    • Proportion for a Sample: (p-hat)

    • Proportion for a Population: (p)

Example Calculation of Proportion

  • Non Smartphone Owners:

    • p̂ = 293 / 2253 = 0.13

    • Equivalent Percentage: 13%

Relative Frequency Table

  • Shows proportion of cases in each category:

    • Android: 0.203

    • iPhone: 0.194

    • Blackberry: 0.063

    • Non Smartphone: 0.410

    • No Cell Phone: 0.130

  • Total of Table: Sums to 1 or 100%.

Examples of Proportions with Notation

  • Problem a: For community college statistics, proportion of students preferring UCLA:

    • Total: 45, Preferred: 19

    • Answer: (p = \frac{19}{45} \approx 0.42)

  • Problem b: For apples, with 2 out of 11 being rotten:

    • Proportion: (\hat{p} = \frac{2}{11} \approx 0.18)

Relative Frequency Construction

  • Water Taste Preference Example:

    • Categories: Tap, Aquafina, Fiji, Sam's Choice

    • Total surveyed: 100

    • Relative Frequencies calculated as follows:

      • Tap: 0.10

      • Aquafina: 0.25

      • Fiji: 0.41

      • Sam's Choice: 0.24

Visualizing Data for One Categorical Variable

  • Barplot:

    • Bar height corresponds to counts of cases in each category.

  • Pie Chart:

    • Relative area of each slice represents proportions per category.

Summarizing Data for Two Categorical Variables

  • Student Relationship Example: Surveyed university students on gender and relationship status.

Two-Way Table

  • For relationship and gender:

    Female

    Male

    Total

    In a Relationship

    32

    10

    42

    It’s Complicated

    12

    7

    19

    Single

    63

    45

    108

    Total

    107

    62

    169

Proportion Calculations

  • Proportions can be derived for categories using totals from the two-way table:

    • Sample Proportions:

      • Students in a relationship: [ \frac{42}{169} \approx 0.248 ]

      • Female students: [ \frac{107}{169} \approx 0.634 ]

      • Students in relationship that are female: [ \frac{32}{42} \approx 0.762 ]

Caution on Proportion Interpretation

  • A distinction between proportions of categories is critical:

    • The proportion of females in relationships differs from that of relationships being female.

Difference in Proportions Concept

  • Difference between proportions can show relationships between two categorical variables.

Example Calculations for Differences

  • For singles by gender:

    • Female proportion: ( \frac{63}{107} \approx 0.589 )

    • Male proportion: ( \frac{45}{62} \approx 0.726 )

    • Difference: (0.589 - 0.726 = -0.137)

Example 5: Comparing Proportions of Female Students

  • Calculate:

    • ( \hat{p}_1 = \frac{32}{42} \approx 0.762 ) (in relationship)

    • ( \hat{p}_2 = \frac{63}{108} \approx 0.583 ) (single students)

    • Difference: (0.762 - 0.583 = 0.179)

Visualizing Two Categorical Variables

  • Side by Side Bar Chart:

    • Provides comparative visuals for two categories

  • Segmented Bar Chart:

    • Displays stacked, relative proportions.

Example 6: Therapy and Sleep Improvement Study

  • Two-Way Table Creation:

    • Groups: Therapy vs. No Treatment

  • Proportion Calculations:

    • Therapy Group Improvements: ( \hat{p} ' = \frac{14}{20} = 0.7 )

    • No Treatment Improvements: ( \hat{p} ( = \frac{3}{20} = 0.15 )

    • Difference: ( \hat{p} ' - \hat{p} ( = 0.7 - 0.15 = 0.55)