AA

Statistics and Testing Concepts

The File Drawer Effect

  • Definition: Failing to publish uninteresting results, often occurring when researchers fail to reject the null hypothesis (FTR Ho).

Central Limit Theorem (CLT)

  • Statement of CLT: With a sample size of 64 drawn from a distribution with mean (µ) = 50 and standard deviation (σ) = 12, the sample mean (X) is approximately normally distributed with:
    • Mean: 50
    • Standard Deviation: ( \sigma_{X} = \frac{12}{\sqrt{64}} = 1.5 )
  • Z-Score Calculation: For a score of 48, the standardized residual (z-score) is calculated as:
    • ( z = \frac{48 - 50}{1.5} = -1.33 )

Type I and Type II Errors

  • Type I Error (α): Rejecting the null hypothesis (Ho) when it is true.
    • Probability of Type I error is denoted as P(Type I error) = α.
  • Type II Error (β): Failing to reject the null hypothesis when it is false.
    • Probability of Type II error is denoted as P(Type II error) = β.
  • Power of a Test: The probability of correctly rejecting a false null hypothesis, denoted as 1 - β.

Benford's Law

  • Use: To test the legitimacy of data by examining the distribution of the first digit (logarithmic pattern).
  • Formula: ( \text{P(first digit} = d) = \log_{10}(1 + \frac{1}{d}) )

Chi-Square Tests Example 1: Commuter Status vs. Yankee Fans

  • Data Summary:
    • Commuters: 20 fans, 25 non-fans (Total: 45)
    • Non-Commuters: 10 fans, 50 non-fans (Total: 60)
  • Hypotheses:
    • Ho: Commuter status is independent of being a Yankee fan.
    • Ha: Commuter status is not independent of being a Yankee fan.
  • Chi-Square Result:
    • Test Statistic = 9.72, Degrees of Freedom (DF) = 1, P-Value = 0.0018
    • Since P-Value < α (0.05), reject Ho.
  • Conclusion: Evidence suggests a relationship between commuter status and being a Yankee fan.
  • Odds Ratio Calculation:
    • ( \frac{20 \cdot 50}{10 \cdot 25} = 4 )
    • Interpretation: Odds of a commuter being a Yankee fan is 4 times that of a non-commuter.
  • Expected Count for Commuters who are Non-Yankee Fans:
    • ( \frac{45 \cdot 75}{105} = 32.14 )

Chi-Square Tests Example 2: Party Affiliation and Candidate Support

  • Data Summary:
    • Democrats, Republicans, Independents supporting 4 candidates.
  • Hypotheses:
    • Ho: No relationship between party affiliation and candidate supported.
    • Ha: Relationship exists.
  • Chi-Square Result:
    • Test Statistic = 37.73, P-Value < 0.0001
    • Reject Ho, indicating relationship exists.

Chi-Square Tests Example 3: Car Size and Shirt Size

  • Data Summary:
    • Relationship studied between car size categories and shirt sizes (Small, Medium, Large, X-Large).
  • Hypotheses:
    • Ho: Shirt size and car size are independent.
    • Ha: They are not independent.
  • Chi-Square Result:
    • Test Statistic = 360.76, P-Value < 0.0001
    • Reject Ho, indicating a relationship.
  • Gamma Correlation Coefficient:
    • ( \text{Gamma} = 0.54953 ) indicates a strong positive relationship, suggesting larger cars are associated with larger shirt sizes.

Chi-Square Tests Example 4: Favorite Crayon Color Distribution

  • Data Summary:
    • Frequency of respondents' favorite colors.
  • Hypotheses:
    • Ho: Favorite colors are uniformly distributed.
    • Ha: They are not.
  • Chi-Square Result:
    • Test Statistic = 23.72, P-Value = 0.0013
    • Reject Ho, indicating favorite colors are not uniformly distributed.

Shapiro-Wilk Test

  • Understand how to perform this test for normality assessment of the dataset.