Evaluating Experiments: Statistics and Research Methods in Psychology

Evaluating Experiments: Research Methods in Psychology

Experiments Focus on Differences

  • Experiments aim to determine if changing a variable makes a significant difference.

  • This is assessed by comparing conditions, such as Condition A and Condition B, where an independent variable (IV) is altered.

  • The effect is measured through a dependent variable (DV), and the means (M) of the DV in each condition are compared (e.g., M = 18.5 in Condition A vs. M = 20.2 in Condition B).

  • The core question is whether the observed numerical difference is a genuine effect or simply due to chance variation.

Determining Significance

  • To ascertain whether a difference between conditions is statistically significant, an ANOVA (Analysis of Variance) or a t-test is employed.

  • These statistical tests help determine if the observed difference is greater than what would be predicted by chance.

The Problem of Multiple Comparisons

  • When conducting multiple statistical comparisons, the probability of a Type I error (false alarm) increases.

  • The Type I error rate is typically denoted by α\alpha, often set at 0.05, meaning a 5% chance of a false positive.

  • Example: Comparing memory across three music genres (Triphop, Country, Smooth Jazz) with individual α\alpha = 0.05 for each comparison.

  • If each comparison has α\alpha = 0.05, the total α\alpha becomes 0.05 + 0.05 + 0.05 = 0.15, leading to an inflated risk of a Type I error.

New Analysis

  • To address multiple comparisons, advanced statistical methods are used.

  • The goal remains to compare the difference between means (e.g., is M1 different from M2?).

  • A Student’s t-test is suitable for comparing two means, while an Analysis of Variance (ANOVA) can compare more than two means simultaneously by using the F ratio.

Post-Hoc Pairwise Comparisons

  • If the F ratio in ANOVA is significant, it indicates that there are differences among the groups, but it doesn't specify which groups differ.

  • Post-hoc pairwise comparisons involve comparing two conditions at a time to pinpoint specific differences.

  • These comparisons must be corrected for multiple analyses to control the Type I error rate.

  • The Bonferroni correction is a common method, where α\alpha is divided by the number of comparisons to obtain a corrected alpha\\alpha.

  • For example, with 3 comparisons, \[ .05 / 3 ] = approximately 0.017. Thus, the Bonferroni-corrected value is 0.017, and 0.017 + 0.017 + 0.017 ≈ 0.05.

Statistical vs. Practical Significance

  • Statistical Significance: Indicates that an effect is unlikely to have occurred by chance, determined by the p-value associated with a test statistic.

  • Practical Significance: Reflects the real-world impact or importance of an effect and is determined by measures of effect size.

Effect Size

  • Effect size measures the magnitude of an effect, indicating its practical importance.

  • For differences between groups, effect size depends on variance.

  • Cohen’s d (or Hedges’ g or η2\eta^2) is a common measure: Cohen’s d=DifferenceVariance\text{Cohen’s d} = \frac{\text{Difference}}{\text{Variance}}

Categorizing Effect Size

  • Effect sizes are categorized to provide context for their magnitude:

    • Small: r|r| = 0.1, Cohen’s d = 0.20

    • Small medium: r|r| = 0.2, Cohen’s d = 0.41

    • Medium: r|r| = 0.3, Cohen’s d = 0.63

    • Medium large: r|r| = 0.4, Cohen’s d = 0.87

    • Large: r|r| = 0.5, Cohen’s d = 1.15

    • Really large: r|r| = 0.6, Cohen’s d = 1.50

    • Gigantic: r|r| = 0.7, Cohen’s d = 1.96

  • Based on Lakens & Evers, 2014.

Power and Effect Size

  • Power is the probability of detecting an effect when it exists in a particular study.

  • Alpha Level: A more generous (larger) alpha level increases power (e.g., changing from 0.05 to 0.1), but it also increases the Type I error rate.

  • Sample Size: Larger sample sizes provide more power.

  • Effect Size: Larger effects are easier to detect.

Sample Size and Power

  • The sample size needed for 90% power varies with effect size:

    • Small (r|r| = 0.1, Cohen’s d = 0.20): N = 527

    • Small medium (r|r| = 0.2, Cohen’s d = 0.41): N = 126

    • Medium (r|r| = 0.3, Cohen’s d = 0.63): N = 54

    • Medium large (r|r| = 0.4, Cohen’s d = 0.87): N = 29

    • Large (r|r| = 0.5, Cohen’s d = 1.15): N = 17

    • Really large (r|r| = 0.6, Cohen’s d = 1.50): N = 11

    • Gigantic (r|r| = 0.7, Cohen’s d = 1.96): N = 7

  • Based on Lakens & Evers, 2014.

Results: Experimental Study

  • Data is divided into groups based on the level of the independent variable (IV).

  • Descriptive statistics (e.g., means, standard deviations) are reported for the dependent variable (DV) within each group.

  • ANOVA is conducted to compare all three groups.

  • Post-hoc t-tests are performed to compare individual group differences and identify specific effects, while controlling for multiple comparisons.