KA

Understanding Statistical Significance

Introduction

  • The purpose is to examine the necessary information for determining results in an experiment focusing on averages and variability.

Averages and Variability

  • Mean: The average of a group, used to represent central tendency.

  • Importance of Variability:

    • Variability provides insight into how spread out the scores around the average are.

    • Without variability, the mean alone does not give enough information to support predictions or hypotheses.

Hypothetical Experiment Scenarios

Universe One Scenario

  • Visual Representation: Two distributions:

    • Group allowed to text (light green).

    • Group not allowed to text (purple).

  • Averages:

    • Texting group: average = 81

    • Non-texting group: average = 78.

  • Low Variability:

    • Distributions are peaked around the mean.

    • Low overlap between scores of two groups.

  • Conclusion:

    • More confidence in stating the groups are different due to little overlap.

Universe Two Scenario

  • Visual Representation: Two distributions:

    • Group allowed to text (light green).

    • Group not allowed to text (purple).

  • Averages:

    • Both groups still average 81 and 78, respectively.

  • High Variability:

    • Distributions are spread out.

    • Significant overlap between two groups' scores.

  • Conclusion:

    • Less confidence in stating groups are different due to high overlap.

Statistical Methods Post-Data Collection

  • Hypothesis Testing: A statistical method used to analyze results and determine differences between groups.

  • P-Value: Represents the probability that the observed differences between groups occurred by chance.

    • A smaller p-value suggests that the observed differences are statistically significant.

    • Common threshold for significance is 0.05, indicating less than a 5% chance of result occurring by chance.

  • Outcomes of Hypothesis Testing:

    • Low p-value: Stronger evidence that differences between groups are meaningful.

    • High p-value: Insufficient evidence to claim a meaningful difference.

Interpretation of Results

  • When p-value is low (e.g., < 0.05) for a situation like Universe One, conclude significant differences.

  • If results show high variability and greater overlap (as in Universe Two), conclude not statistically significant.

Limitations of Hypothesis Testing

Multiple Testing Concern

  • False Positives: As the number of tests increases, the likelihood of false positives (claiming a difference when there isn't one) also increases.

  • Example: If conducting 100 tests at a 5% error rate, expect 5 false positives.

Example of Missing Evidence

  • Scenario: Scientists studying the relationship of jelly beans to acne.

    • Initial p-values > 0.05 indicate no evidence.

    • Adjustments for color lead to various tests until one significant result appears (e.g., green jelly beans).

  • Conclusion: Media often focuses on false significant results, skewing perception.

Arbitrary Nature of p-values

  • Criticism of relying solely on p-value thresholds (e.g., 0.05).

  • Example: p-value of 0.06 is often dismissed even though it only implies a slightly higher probability of chance occurrences.

Impact of Sample Size

  • Larger sample sizes increase likelihood of finding significant differences, even when they might not be meaningful.

  • Example: Language skills between men (blue distribution) and women (pink distribution):

    • Small significant difference found with large sample sizes, indicating statistical significance but not practical significance.

Alternative Approaches

Effect Sizes

  • Definition: A calculated statistic indicating the magnitude of differences between groups, moving beyond the yes-no significance decision.

  • Usefulness: Provides nuanced information about the size of the effect, especially in the presence of overlap.

  • Interpretation of Effect Size:

    • Small effect size: Indicates little meaningful difference.

    • Large effect size: Suggests a real difference.

  • Suggested approach: Combine the significance from hypothesis testing with effect size for a comprehensive analysis.

Conclusion

  • While p-values are a common tool in data analysis, understanding their limitations is crucial in scientific research.

  • Incorporating effect sizes can provide a deeper understanding of observational data and its practical implications for real-world scenarios.