Exhaustive Study Notes on Experimental Research Methods

Basics of Experimental Research Methods and Descriptive Statistics

Experimental Research Methods Defined: These are techniques used to analyze experimental data to draw scientific conclusions.
Descriptive Statistics: The primary purpose of descriptive statistics is to summarize data. Data is defined as numerical information pertaining to a population or a sample. Descriptive statistics are necessary because raw data lists are often unclear and difficult to interpret.
Methods to Summarize Data: * Distribution: Data points are summarized by grouping those with the same score. This is visualized via frequency distribution tables or histograms. SPSS syntax is significant for examinations when generating these. * Sample Statistics: Data is summarized using characteristic features of the distribution.

Characteristic Features of a Distribution

Central Tendency: This represents the most characteristic score of a distribution. The measures include: * Mean: The sum of all scores divided by the total number of scores ( $N$ ). Individual scores are denoted as $x$ . * Median: The middle score in a distribution. * Mode: The most frequently occurring score.
Dispersion (Variance): This measures how much individual scores deviate from the central tendency. Measures include: * Range: The difference between the highest and lowest scores. * Variance ( $s^2$ ): Calculated as the sum of all squared deviance scores divided by the number of scores minus one: $s^2 = \frac{\sum (x - \bar{x})^2}{N - 1}$ * Standard Deviation ( $s$ ): The square root of the variance: $s = \sqrt{s^2}$

Inferential Statistics Principles

Context: Descriptive statistics are sufficient only when data encompasses the entire population. However, researchers almost always use samples because population data collection is too expensive, takes too long, or is physically impossible.
Goal: Inferential statistics allow researchers to draw conclusions about a population based on a representative sample.
The Three Procedures of Inferential Statistics: 1. Hypothesis testing. 2. Point estimation. 3. Interval estimation (Confidence Intervals).

Hypothesis Testing Logic and Procedures

Hypotheses Framework: * Hypotheses must be exclusive (only one can be true) and exhaustive (all possibilities must be included). * Null Hypothesis ( $H_0$ ): Usually assumes no effect or equality (e.g., $H_0: \mu = 2.5$ ). It always contains the "=" sign. * Alternative Hypothesis ( $H_1$ / $H_a$ ): Contains the researcher's expectation (e.g., $H_1: \mu \neq 2.5$ for two-sided; $H_1: \mu > 2.5$ or $H_1: \mu < 2.5$ for one-sided).
Decision Logic: * Assume $H_0$ is true. Determine the sampling distribution of the sample mean ( $\bar{x}$ ). * The sampling distribution mean is $\mu$ , and the variance is $\frac{\sigma^2}{N}$ . * P-value: The probability that the observed sample statistic ( $\bar{x}$ ) or a more extreme value occurs, assuming $H_0$ is true. * Rule: If $p < \alpha$ (significance level), the probability is too small to trust $H_0$ , so $H_0$ is rejected. * Rule: If $p \geq \alpha$ , there is not enough evidence to doubt $H_0$ , so it is not rejected.
Steps in Hypothesis Testing: 1. Formulate $H_0$ and $H_1$ . 2. Determine the decision rule ( $p < \alpha$ ). 3. Determine the p-value based on SPSS output. 4. Make the decision regarding significance and formulate a conclusion.
Assumptions: One-sample tests assume a "simple random sample," meaning all cases have an equal chance to be sampled and cases are selected independently.

Point and Interval Estimation

Point Estimation: Provides the "best guess" of a population parameter based on sample statistics. * For the population mean $\mu$ , the best estimate is the sample mean $\bar{x}$ . * For the population variance $\sigma^2$ , the best estimate is the sample variance $s^2$ .
Interval Estimation (Confidence Intervals): Provides a range in which the parameter lies with a specific confidence (usually 95%). * Formula for 95% CI of $\mu$ : $\bar{x} \pm t_{CV} \times \frac{s}{\sqrt{N}}$ * Relation to Testing: If a hypothetical value ( $\mu_0$ ) falls inside the 95% CI, $H_0$ cannot be rejected at $\alpha = 0.05$ . If it falls outside, $H_0$ can be rejected in favor of a two-sided alternative. * Interpretation: A 95% CI means that if 100 samples were drawn, 95 of them would produce a CI containing the true population mean. A Type I error occurs if the 5% of samples that do not contain $\mu$ lead to a rejection of $H_0$ .

Statistical Power and Errors

Types of Errors: * Type I Error ( $\alpha$ ): Rejecting $H_0$ when it is actually true. * Type II Error ( $\beta$ ): Failing to reject $H_0$ when it is actually false.
Power ( $1 - \beta$ ): The probability of correctly rejecting the null hypothesis. Researchers prefer highly powered tests to ensure effects are detected when they exist.
Four Factors Influencing Power: 1. Significance Level ( $\alpha$ ): A larger $\alpha$ (e.g., 0.10 vs 0.05) increases the rejection region, making it easier to reject $H_0$ and thus increasing power. 2. Sample Size ( $N$ ): Increasing $N$ decreases the standard error, which increases the z-score/t-score, making it easier to reject $H_0$ . 3. Population Standard Deviation ( $\sigma$ ): If $\sigma$ decreases, the standard error decreases, increasing power. 4. True Population Mean ( $\mu_{H1}$ ): A larger effect size (distance between $\mu_0$ and the true $\mu$ ) increases power.

Effect Size Concepts and Formulas

Importance: Statistical significance ( $p < 0.05$ ) does not prove a systematic effect exists (could be Type I error) and does not mean the effect is practically or clinically relevant. Large samples can make even tiny, uninteresting differences significant.
Measures of Effect Size: * Cohen's d: Standardizes the difference between group means. * One-group formula: $d = \frac{t}{\sqrt{N}}$ * Two-group formula: $d = t \times \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$ * (Partial) Explained Variance ( $\eta^2$ ): How much of the variance in the dependent variable (DV) is explained by group membership. * Formula: $\eta^2 = \frac{t^2}{t^2 + df_w}$ * Degrees of Freedom ( $df_w$ ) for two groups: $df_w = n_1 + n_2 - 2$ .
Rules of Thumb for Cohen's d: * Small Effect: 58% of the experimental group scores higher than the average of the control group. * Medium Effect: 69% of the experimental group scores higher than the control mean. * Large Effect: 79% of the experimental group scores higher than the control mean.

One-Way Analysis of Variance (ANOVA)

Purpose: Compares means across $K \geq 2$ populations. It originated from Fisher's experimental designs.
Terminology: * Between-subjects: Independent samples for each condition. * Within-subjects: The same subjects are exposed to multiple conditions (Repeated Measures). * Factors: Categorical independent variables. * Levels: Categories within factors. * Factorial Design: Designs with multiple factors (e.g., 3x3fully crossed factorial design has 9 conditions).
Hypotheses in ANOVA: * $H_0: \mu_1 = \mu_2 = ... = \mu_k$ * $H_1:$ Not $H_0$ (at least one pair of means differs).
Inflated Type I Error: If multiple t-tests are used for $K$ groups, the risk of at least one Type I error is: $1 - (1 - \alpha)^C$ , where $C = \frac{K \times (K-1)}{2}$ . ANOVA (the omnibus test) protects against this.

Logic and Numerical Modeling of ANOVA

Variance Decomposition: ANOVA splits total variance into variance between groups and variance within groups. * Variance Between: Differences between grand mean and group means; explained by group membership. * Variance Within (Residual/Error): Differences between observed scores and group means; cannot be explained by group membership; represents random variation.
Linear Additive Model: $Y_{ik} = \mu + \alpha_k + \epsilon_{ik}$ * $Y_{ik}$ : Score of person $i$ in group $k$ . * $\mu$ : Grand mean population parameter. * $\alpha_k$ : Group effect ( $\mu_k - \mu$ ). * $\epsilon_{ik}$ : Residual error ( $Y_{ik} - \mu_k$ ).
F-Test Statistic: $F = \frac{MS_{Between}}{MS_{Within}}$ * $MS_{Between} = \frac{SS_{Between}}{df_{Between}}$ , where $df_{Between} = K - 1$ . * $MS_{Within} = \frac{SS_{Within}}{df_{Within}}$ , where $df_{Within} = N - K$ . * Under $H_0$ , we expect $F = 1$ . If $F > 1$ , it provides evidence against $H_0$ . The value is checked against the F-distribution using critical value $F(df_1, df_2)$ .

Assumptions of ANOVA

Independent Observations: Violated in repeated measures. ANOVA is not robust to this violation.
Normality of Residuals ( $\epsilon$ ): ANOVA is robust if $n_k \geq 30$ . If not, use Kruskal-Wallis.
Homogeneity of Variance: Tested via Levene's test. ANOVA is robust if the highest $n$ is less than 1.5 times the lowest $n$ . If not met, use Welch's test.
Quantitative Scores: The DV must be quantitative to allow mean computation.
No Outliers: This is a general requirement for all statistical methods.

Contrasts and Specific Group Comparisons

Defined: Specific comparisons between individual group means used to detect exactly where differences lie after a significant omnibus ANOVA.
Types: * A priori (Planned): Specified before data collection/inspection. Preferred for hypothesis testing. * Post hoc: Exploratory; used after a significant ANOVA.
Simple vs. Complex: * Simple (Pairwise): Comparing two means (e.g., $\mu_1 = \mu_2$ ). * Complex: Comparing combined groups (e.g., $\mu_1 = \frac{\mu_2 + \mu_3}{2}$ ).
Contrast Value ( $\psi$ ): $\psi = c_1\mu_1 + c_2\mu_2 + c_3\mu_3$ * Restriction: The sum of contrast coefficients must be zero ( $\sum c_k = 0$ ). * Unequal Sample Size Formula: $c_k = \frac{n_k}{\sum n_i} \times n_k$ (adjusted for sample sizes on each side of the comparison).
Orthogonal Contrasts: Contrasts that are independent/uncorrelated. * Condition for equal $n$ : $\sum (c_{1k} \times c_{2k}) = 0$ . * A set of $K-1$ orthogonal contrasts can completely explain $SS_{Between}$ .

Post-Hoc Comparison Methods

Tukey Honestly Significant Difference (HSD): * Tests all possible pairs of means. * Best for equal sample sizes. * Uses the studentized range statistic $q$ with critical value $q_{CV}$ . As $K$ increases, $q_{CV}$ increases to control experiment-wise error.
Scheffé Test: * Allows for complex contrasts. * The most conservative post-hoc test. * Critical value: $(K - 1) \times F_{CV}$ , where $F_{CV}$ is based on ANOVA degrees of freedom. * If ANOVA is significant, at least one Scheffé contrast must be significant; if ANOVA is non-significant, no Scheffé contrast will be significant.
Bonferroni Correction: Used for planned multiple contrasts. $\alpha_{Bonferroni} = \frac{\alpha}{C}$ , where $C$ is the number of contrasts.

Two-Way ANOVA and Interaction Effects

Defined: An ANOVA with two independent categorical variables (Factor A and Factor B). A factorial design (e.g., A x B fully crossed).
Hypotheses Tested: 1. Main Effect of A: Differences in means across levels of Factor A. 2. Main Effect of B: Differences in means across levels of Factor B. 3. Interaction Effect (A*B): Whether the effect of Factor A depends on the level of Factor B.
Interaction Visualization: Interaction exists if the lines in a profile plot are not parallel. * Ordinal Interaction: Lines do not intersect; the order of group means remains consistent. * Disordinal Interaction: Lines intersect; the order of group means changes across levels.
Simple Effects: If an interaction is significant, main effects can be deceiving. Researchers must perform simple effects analysis (one-way ANOVA for one factor at each individual level of the other factor).
Balanced vs. Unbalanced Designs: * Balanced: Equal cell sizes ( $n_{jk}$ ) or proportional cell sizes. $SS_{Between} = SS_A + SS_B + SS_{AB}$. * Unbalanced: Cell sizes are not proportional. $SS_A, SS_B, $ and $SS_{AB}$ do not sum perfectly to $SS_{Between}$. Partial $\eta^2$ is used as the effect size measure to show additional variance explained by a factor after accounting for others.

Analysis of Covariance (ANCOVA)

Defined: Analysis of variance used to compare group means while controlling for a continuous variable known as a covariate ( $X$ ).
Goals: 1. Elimination of Bias: Correcting for mean differences on the covariate that existed before the experiment (critical in quasi-experiments). 2. Reduction of Error Variance: Removing variance in the DV explained by the covariate, thereby decreasing $MS_{Within}$ and increasing the power of the test for the main factor.
Experimental vs. Quasi-experimental Contexts: * Experiments: Random assignment usually makes group covariant means equal. ANCOVA primarily reduces error variance. * Quasi-experiments: Groups often differ on the covariate initially. ANCOVA removes the bias of the covariate and reduces error variance.
Additional Assumptions for ANCOVA: * The covariate is measured before the experimental manipulation. * The covariate is measured without error (use reliable measures). * The relationship between the covariate ( $X$ ) and dependent variable ( $Y$ ) is linear. * Homogeneity of Regression Slopes: The relation between $X$ and $Y$ must be the same for all treatment levels (no interaction between the factor and the covariate).

Repeated Measures ANOVA (Within-Subjects Design)

Context: Used when for each subject, there is more than one measurement on the DV (e.g., longitudinal studies, test-retest, diary studies).
Subject as Own Control: This design acts like an ANCOVA by using the subject as a control to reduce error variance. It partitions out the variance attributable to individual subject differences ( $SS_{Subject}$ ).
Sphericity Assumption: An extension of the homogeneity of variance. It requires that the variances of the difference scores between any two levels of the within-subjects factor are equal. * Violation: Tested via Mauchly's test. If violated, F-values are artificially inflated, and p-values are too small. * Corrections: Greenhouse-Geisser or Huyn-Feldt corrections adjust the degrees of freedom via factor $\epsilon$ to correct the p-value. Lower-bound is the most extreme correction ( $\epsilon = \frac{1}{K-1}$ ).
Mixed-Effects Models: Often combines within-subjects factors (e.g., time) and between-subjects factors (e.g., education level). Analyses then test for main effects of both and the interaction between them.

Designing an Experiment: Validity and Reliability

Goal: Establish causal relationships between IV and DV via manipulation of the experimental group and use of a control group.
Internal Validity: The degree to which the relationship between the IV and DV reflects only that relationship, without contamination from confounding variables.
External Validity: The degree to which results generalize to other people, locations, and contexts.
Sampling Challenges: * Simple Random Sample: Preferable for high external validity but difficult to obtain. * Convenience Sample: Frequently used in psychology (often university students). Leads to volunteer bias or subject sophistication, threatening external validity.
Confounding Variables: Variables that differ between conditions (e.g., IQ in a teaching study). They decrease internal validity by offering alternative explanations and decrease power by increasing $MS_{Within}$ .

Methods to Control Participant Variables

Random Assignment: Best for balancing unknown variables in large samples; less effective in small samples.
Systematic Balancing: Measuring a confound (e.g., IQ) before manipulation and ensuring groups are balanced by matching category counts.
Matched Group Design: Measuring a confound and pairing participants with identical scores across conditions. Highly effective but difficult to find matches for large groups.
Limiting the Population: Including only participants with specific values on a confound (e.g., only IQ 100-110). This increases internal validity and power but significantly reduces external validity.
Counterbalancing in Within-Subjects Designs: Used to control for order effects (practice, fatigue, carry-over, or response sets): * Latin Square Design: A partial balance where every condition appears in each ordinal position, but sequence follows a specific pattern (e.g., B always follows A). * Random Balance: Each participant receives a unique random order. * Full Balance: All possible orders of conditions are presented across the sample. Requires a very large $N$ .
Threats in WS Designs: Subject history (events outside the study), maturation (growing older/changing), and subject mortality (attrition or dropout before completion).