Evaluating Experiments: Statistics and Research Methods in Psychology

Experiments aim to determine if changing a variable makes a significant difference.
This is assessed by comparing conditions, such as Condition A and Condition B, where an independent variable (IV) is altered.
The effect is measured through a dependent variable (DV), and the means (M) of the DV in each condition are compared (e.g., M = 18.5 in Condition A vs. M = 20.2 in Condition B).
The core question is whether the observed numerical difference is a genuine effect or simply due to chance variation.

To ascertain whether a difference between conditions is statistically significant, an ANOVA (Analysis of Variance) or a t-test is employed.
These statistical tests help determine if the observed difference is greater than what would be predicted by chance.

When conducting multiple statistical comparisons, the probability of a Type I error (false alarm) increases.
The Type I error rate is typically denoted by $\alpha$ , often set at 0.05, meaning a 5% chance of a false positive.
Example: Comparing memory across three music genres (Triphop, Country, Smooth Jazz) with individual $\alpha$ = 0.05 for each comparison.
If each comparison has $\alpha$ = 0.05, the total $\alpha$ becomes 0.05 + 0.05 + 0.05 = 0.15, leading to an inflated risk of a Type I error.

To address multiple comparisons, advanced statistical methods are used.
The goal remains to compare the difference between means (e.g., is M1 different from M2?).
A Student’s t-test is suitable for comparing two means, while an Analysis of Variance (ANOVA) can compare more than two means simultaneously by using the F ratio.

If the F ratio in ANOVA is significant, it indicates that there are differences among the groups, but it doesn't specify which groups differ.
Post-hoc pairwise comparisons involve comparing two conditions at a time to pinpoint specific differences.
These comparisons must be corrected for multiple analyses to control the Type I error rate.
The Bonferroni correction is a common method, where $\alpha$ is divided by the number of comparisons to obtain a corrected $\\alpha$ .
For example, with 3 comparisons, \[ .05 / 3 ] = approximately 0.017. Thus, the Bonferroni-corrected value is 0.017, and 0.017 + 0.017 + 0.017 ≈ 0.05.

Statistical Significance: Indicates that an effect is unlikely to have occurred by chance, determined by the p-value associated with a test statistic.
Practical Significance: Reflects the real-world impact or importance of an effect and is determined by measures of effect size.

Effect size measures the magnitude of an effect, indicating its practical importance.
For differences between groups, effect size depends on variance.
Cohen’s d (or Hedges’ g or $\eta^2$ ) is a common measure: $\text{Cohen’s d} = \frac{\text{Difference}}{\text{Variance}}$

Power is the probability of detecting an effect when it exists in a particular study.
Alpha Level: A more generous (larger) alpha level increases power (e.g., changing from 0.05 to 0.1), but it also increases the Type I error rate.
Sample Size: Larger sample sizes provide more power.
Effect Size: Larger effects are easier to detect.

Data is divided into groups based on the level of the independent variable (IV).
Descriptive statistics (e.g., means, standard deviations) are reported for the dependent variable (DV) within each group.
ANOVA is conducted to compare all three groups.
Post-hoc t-tests are performed to compare individual group differences and identify specific effects, while controlling for multiple comparisons.