Statistics and Testing Concepts
The File Drawer Effect
- Definition: Failing to publish uninteresting results, often occurring when researchers fail to reject the null hypothesis (FTR Ho).
Central Limit Theorem (CLT)
- Statement of CLT: With a sample size of 64 drawn from a distribution with mean (µ) = 50 and standard deviation (σ) = 12, the sample mean (X) is approximately normally distributed with:
- Mean: 50
- Standard Deviation: ( \sigma_{X} = \frac{12}{\sqrt{64}} = 1.5 )
- Z-Score Calculation: For a score of 48, the standardized residual (z-score) is calculated as:
- ( z = \frac{48 - 50}{1.5} = -1.33 )
Type I and Type II Errors
- Type I Error (α): Rejecting the null hypothesis (Ho) when it is true.
- Probability of Type I error is denoted as P(Type I error) = α.
- Type II Error (β): Failing to reject the null hypothesis when it is false.
- Probability of Type II error is denoted as P(Type II error) = β.
- Power of a Test: The probability of correctly rejecting a false null hypothesis, denoted as 1 - β.
Benford's Law
- Use: To test the legitimacy of data by examining the distribution of the first digit (logarithmic pattern).
- Formula: ( \text{P(first digit} = d) = \log_{10}(1 + \frac{1}{d}) )
Chi-Square Tests Example 1: Commuter Status vs. Yankee Fans
- Data Summary:
- Commuters: 20 fans, 25 non-fans (Total: 45)
- Non-Commuters: 10 fans, 50 non-fans (Total: 60)
- Hypotheses:
- Ho: Commuter status is independent of being a Yankee fan.
- Ha: Commuter status is not independent of being a Yankee fan.
- Chi-Square Result:
- Test Statistic = 9.72, Degrees of Freedom (DF) = 1, P-Value = 0.0018
- Since P-Value < α (0.05), reject Ho.
- Conclusion: Evidence suggests a relationship between commuter status and being a Yankee fan.
- Odds Ratio Calculation:
- ( \frac{20 \cdot 50}{10 \cdot 25} = 4 )
- Interpretation: Odds of a commuter being a Yankee fan is 4 times that of a non-commuter.
- Expected Count for Commuters who are Non-Yankee Fans:
- ( \frac{45 \cdot 75}{105} = 32.14 )
Chi-Square Tests Example 2: Party Affiliation and Candidate Support
- Data Summary:
- Democrats, Republicans, Independents supporting 4 candidates.
- Hypotheses:
- Ho: No relationship between party affiliation and candidate supported.
- Ha: Relationship exists.
- Chi-Square Result:
- Test Statistic = 37.73, P-Value < 0.0001
- Reject Ho, indicating relationship exists.
Chi-Square Tests Example 3: Car Size and Shirt Size
- Data Summary:
- Relationship studied between car size categories and shirt sizes (Small, Medium, Large, X-Large).
- Hypotheses:
- Ho: Shirt size and car size are independent.
- Ha: They are not independent.
- Chi-Square Result:
- Test Statistic = 360.76, P-Value < 0.0001
- Reject Ho, indicating a relationship.
- Gamma Correlation Coefficient:
- ( \text{Gamma} = 0.54953 ) indicates a strong positive relationship, suggesting larger cars are associated with larger shirt sizes.
Chi-Square Tests Example 4: Favorite Crayon Color Distribution
- Data Summary:
- Frequency of respondents' favorite colors.
- Hypotheses:
- Ho: Favorite colors are uniformly distributed.
- Ha: They are not.
- Chi-Square Result:
- Test Statistic = 23.72, P-Value = 0.0013
- Reject Ho, indicating favorite colors are not uniformly distributed.
Shapiro-Wilk Test
- Understand how to perform this test for normality assessment of the dataset.