Understanding Statistical Significance
Introduction
The purpose is to examine the necessary information for determining results in an experiment focusing on averages and variability.
Averages and Variability
Mean: The average of a group, used to represent central tendency.
Importance of Variability:
Variability provides insight into how spread out the scores around the average are.
Without variability, the mean alone does not give enough information to support predictions or hypotheses.
Hypothetical Experiment Scenarios
Universe One Scenario
Visual Representation: Two distributions:
Group allowed to text (light green).
Group not allowed to text (purple).
Averages:
Texting group: average = 81
Non-texting group: average = 78.
Low Variability:
Distributions are peaked around the mean.
Low overlap between scores of two groups.
Conclusion:
More confidence in stating the groups are different due to little overlap.
Universe Two Scenario
Visual Representation: Two distributions:
Group allowed to text (light green).
Group not allowed to text (purple).
Averages:
Both groups still average 81 and 78, respectively.
High Variability:
Distributions are spread out.
Significant overlap between two groups' scores.
Conclusion:
Less confidence in stating groups are different due to high overlap.
Statistical Methods Post-Data Collection
Hypothesis Testing: A statistical method used to analyze results and determine differences between groups.
P-Value: Represents the probability that the observed differences between groups occurred by chance.
A smaller p-value suggests that the observed differences are statistically significant.
Common threshold for significance is 0.05, indicating less than a 5% chance of result occurring by chance.
Outcomes of Hypothesis Testing:
Low p-value: Stronger evidence that differences between groups are meaningful.
High p-value: Insufficient evidence to claim a meaningful difference.
Interpretation of Results
When p-value is low (e.g., < 0.05) for a situation like Universe One, conclude significant differences.
If results show high variability and greater overlap (as in Universe Two), conclude not statistically significant.
Limitations of Hypothesis Testing
Multiple Testing Concern
False Positives: As the number of tests increases, the likelihood of false positives (claiming a difference when there isn't one) also increases.
Example: If conducting 100 tests at a 5% error rate, expect 5 false positives.
Example of Missing Evidence
Scenario: Scientists studying the relationship of jelly beans to acne.
Initial p-values > 0.05 indicate no evidence.
Adjustments for color lead to various tests until one significant result appears (e.g., green jelly beans).
Conclusion: Media often focuses on false significant results, skewing perception.
Arbitrary Nature of p-values
Criticism of relying solely on p-value thresholds (e.g., 0.05).
Example: p-value of 0.06 is often dismissed even though it only implies a slightly higher probability of chance occurrences.
Impact of Sample Size
Larger sample sizes increase likelihood of finding significant differences, even when they might not be meaningful.
Example: Language skills between men (blue distribution) and women (pink distribution):
Small significant difference found with large sample sizes, indicating statistical significance but not practical significance.
Alternative Approaches
Effect Sizes
Definition: A calculated statistic indicating the magnitude of differences between groups, moving beyond the yes-no significance decision.
Usefulness: Provides nuanced information about the size of the effect, especially in the presence of overlap.
Interpretation of Effect Size:
Small effect size: Indicates little meaningful difference.
Large effect size: Suggests a real difference.
Suggested approach: Combine the significance from hypothesis testing with effect size for a comprehensive analysis.
Conclusion
While p-values are a common tool in data analysis, understanding their limitations is crucial in scientific research.
Incorporating effect sizes can provide a deeper understanding of observational data and its practical implications for real-world scenarios.