Chapter+8+--+Complete

Chapter 8: Comparing More Than Two Proportions

Introduction

  • Focuses on inference related to more than two categorical variables.

  • Aims to explore interactions between categorical variables in depth.

  • This chapter shifts focus towards theoretical approaches for inference.

  • Chapter 9 will address Mean Group Differences.


Section 8.1: Comparing Multiple Proportions

  • Qualitative or categorical measurements include examples such as:

    • M&M colors (6 possible colors)

    • Airline ticket classes (coach, business, first)

    • Survey responses (strongly disagree to strongly agree)

  • Such data can be recorded as counts across categories, representing a multinomial experiment.

  • Binomial experiments are limited to two categories.


Exact Multinomial Tests (Simulation)

  • For experiments with two categories, can model with a weighted coin flip.

  • For k > 2 categories, use weighted dice to simulate data with specific probabilities for each category.

  • Use frequency observations to create the sample statistic (n1, n2, …, nk).

  • P-value is computed by finding combinations of probabilities that match or are lower than our case.

  • Larger samples require more computational power, leading to a preference for theoretical methods.


Example: Ice Cream Preferences

  • A local pharmacy's ice cream sales data is analyzed to verify if flavor preferences have changed from five years ago:

    • Previous proportions: Strawberry (25%), Chocolate (40%), Vanilla (20%), Butterscotch (15%).

    • Owner collects customer preference data over one day.

    • Plan to evaluate the evidence of preference change at a 10% significance level.


Using R

  • The analysis is a multinomial experiment with one variable.

  • Utilize the xmutlti formula from the XNominal package in R to perform inference.

  • Example output shows 4960 different tables can be constructed; the observed situation has a probability of 0.002638.


Hypothesis Test — Ice Cream Preferences Solution

  1. Assumption: Simple random sample from a multinomial distribution.

  2. Hypotheses:

    • H0: Ice cream preference remains the same.

    • Ha: Ice cream preference differs.

  3. Test Statistic: 7

  4. P-value: Simulated p = 0.5666

  5. Conclusions: p > 0.10

    • Fail to reject H0; insufficient evidence for a preference difference.


The Hypergeometric (Simulation) Method (Fisher's Exact Test)

  • This test allows simulation for 2x2 tables and is based on a multivariate hypergeometric distribution.

  • The test statistic utilizes contingency tables, with the p-value derived by summing probabilities of observed configurations.

  • Larger samples often warrant theoretical tests instead due to computational complexity.


Section 8.3: Chi-Square Goodness-of-Fit Test

  • Focuses on theoretical applications of inferring single qualitative variables with two or more categories.

  • Traditional z-procedures and t-procedures do not fit scenarios with multiple categories; Chi-Square tests are applied.

  • The Chi-Square distribution is introduced, leading to the Goodness-of-Fit Test.


The Chi Square Distribution

  • Chi-square distribution is right-skewed with degrees of freedom (df).

  • Notation: χ2(df),α indicates the critical χ2 value at significance level α.

  • Basic properties:

    1. Total area under the curve = 1.

    2. Begins at 0 and extends to the right indefinitely.

    3. Right-skewed curve.

    4. As df increases, the curve resembles a normal distribution.


Exploring the Goodness-of-Fit Test Example

  • M&M's are produced in claimed proportions; a sample is analyzed to see if the observed distribution aligns with the expected.

  • Null Hypothesis (H0): M&M color distribution is accurate according to the company's claims.

  • Alternative Hypothesis (Ha): Distribution differs from the claimed proportions.


Setting up the Problem

  • Hypotheses expressed as:

    • H0: π1 = π1,0, π2 = π2,0, ..., πk = πk,0

    • Ha: At least one πi ≠ πi,0

  • Aim is to show at least one category has a different proportion.


How to Reject H0

  • Compare actual counts (Oi) with expected counts (Ei) under H0.

  • Compute expected counts using the formula: E_i = n * π_i,0


Validity Conditions for Goodness-of-Fit Test

  • The chi-square test statistic approximates chi-square distribution under specific conditions:

    1. Simple random sample.

    2. Sample size large enough that each expected frequency Ei ≥ 5.

  • Alternative conditions can be applied regarding expected frequencies.


Additional Notes

  • If expected counts are accurate, differences between observed and expected counts are minor (χ² ≈ 0).

  • High differences indicate errors in expected counts resulting in a high χ² value.

  • Degrees of freedom calculated as k - 1, where k = number of categories.


Hypothesis Test — M&M Example Continued

  • R output used for hypotheses testing:

    1. H0: M&M color distribution matches company claims.

    2. Ha: Distribution differs.

    3. Test Statistic: χ² = 1.2468, 5 df

    4. P-value: p = 0.9403

    5. p > 0.05 ➔ Fail to reject H0; insufficient evidence for claim discrepancy.


Section 8.2: Chi-Square Test for Independence

  • Introduces the concept of contingency tables to assess the relationship between two categorical variables.

  • Questions focus on whether variables are associated.

  • If two variables are independent, one variable provides no information on the other.


Example: Happiness and Income

  • The survey investigates the relationship between happiness and family income.

  • Assess observed versus expected counts under the null hypothesis to find suggestions of dependence.


Validity Conditions for Chi-Square Test for Independence

  • Applies a similar form as the Goodness-of-Fit Test:

    1. Simple random sample.

    2. Sufficiently large sample size.

  • Reject H0 if the test statistic is large enough.


Conclusion of Chi-Square Test for Independence Example

  • Assess results from GSS to determine if perceived happiness is associated with income and understand significance levels.

  • Highlight analysis where the chi-square tests provide mere procedures but require careful interpretation.

robot