Evaluating Experiments: Statistics and Research Methods in Psychology
Evaluating Experiments: Research Methods in Psychology
Experiments Focus on Differences
Experiments aim to determine if changing a variable makes a significant difference.
This is assessed by comparing conditions, such as Condition A and Condition B, where an independent variable (IV) is altered.
The effect is measured through a dependent variable (DV), and the means (M) of the DV in each condition are compared (e.g., M = 18.5 in Condition A vs. M = 20.2 in Condition B).
The core question is whether the observed numerical difference is a genuine effect or simply due to chance variation.
Determining Significance
To ascertain whether a difference between conditions is statistically significant, an ANOVA (Analysis of Variance) or a t-test is employed.
These statistical tests help determine if the observed difference is greater than what would be predicted by chance.
The Problem of Multiple Comparisons
When conducting multiple statistical comparisons, the probability of a Type I error (false alarm) increases.
The Type I error rate is typically denoted by , often set at 0.05, meaning a 5% chance of a false positive.
Example: Comparing memory across three music genres (Triphop, Country, Smooth Jazz) with individual = 0.05 for each comparison.
If each comparison has = 0.05, the total becomes 0.05 + 0.05 + 0.05 = 0.15, leading to an inflated risk of a Type I error.
New Analysis
To address multiple comparisons, advanced statistical methods are used.
The goal remains to compare the difference between means (e.g., is M1 different from M2?).
A Student’s t-test is suitable for comparing two means, while an Analysis of Variance (ANOVA) can compare more than two means simultaneously by using the F ratio.
Post-Hoc Pairwise Comparisons
If the F ratio in ANOVA is significant, it indicates that there are differences among the groups, but it doesn't specify which groups differ.
Post-hoc pairwise comparisons involve comparing two conditions at a time to pinpoint specific differences.
These comparisons must be corrected for multiple analyses to control the Type I error rate.
The Bonferroni correction is a common method, where is divided by the number of comparisons to obtain a corrected .
For example, with 3 comparisons, \[ .05 / 3 ] = approximately 0.017. Thus, the Bonferroni-corrected value is 0.017, and 0.017 + 0.017 + 0.017 ≈ 0.05.
Statistical vs. Practical Significance
Statistical Significance: Indicates that an effect is unlikely to have occurred by chance, determined by the p-value associated with a test statistic.
Practical Significance: Reflects the real-world impact or importance of an effect and is determined by measures of effect size.
Effect Size
Effect size measures the magnitude of an effect, indicating its practical importance.
For differences between groups, effect size depends on variance.
Cohen’s d (or Hedges’ g or ) is a common measure:
Categorizing Effect Size
Effect sizes are categorized to provide context for their magnitude:
Small: = 0.1, Cohen’s d = 0.20
Small medium: = 0.2, Cohen’s d = 0.41
Medium: = 0.3, Cohen’s d = 0.63
Medium large: = 0.4, Cohen’s d = 0.87
Large: = 0.5, Cohen’s d = 1.15
Really large: = 0.6, Cohen’s d = 1.50
Gigantic: = 0.7, Cohen’s d = 1.96
Based on Lakens & Evers, 2014.
Power and Effect Size
Power is the probability of detecting an effect when it exists in a particular study.
Alpha Level: A more generous (larger) alpha level increases power (e.g., changing from 0.05 to 0.1), but it also increases the Type I error rate.
Sample Size: Larger sample sizes provide more power.
Effect Size: Larger effects are easier to detect.
Sample Size and Power
The sample size needed for 90% power varies with effect size:
Small ( = 0.1, Cohen’s d = 0.20): N = 527
Small medium ( = 0.2, Cohen’s d = 0.41): N = 126
Medium ( = 0.3, Cohen’s d = 0.63): N = 54
Medium large ( = 0.4, Cohen’s d = 0.87): N = 29
Large ( = 0.5, Cohen’s d = 1.15): N = 17
Really large ( = 0.6, Cohen’s d = 1.50): N = 11
Gigantic ( = 0.7, Cohen’s d = 1.96): N = 7
Based on Lakens & Evers, 2014.
Results: Experimental Study
Data is divided into groups based on the level of the independent variable (IV).
Descriptive statistics (e.g., means, standard deviations) are reported for the dependent variable (DV) within each group.
ANOVA is conducted to compare all three groups.
Post-hoc t-tests are performed to compare individual group differences and identify specific effects, while controlling for multiple comparisons.