Exhaustive Study Notes on Experimental Research Methods
Basics of Experimental Research Methods and Descriptive Statistics
Experimental Research Methods Defined: These are techniques used to analyze experimental data to draw scientific conclusions.
Descriptive Statistics: The primary purpose of descriptive statistics is to summarize data. Data is defined as numerical information pertaining to a population or a sample. Descriptive statistics are necessary because raw data lists are often unclear and difficult to interpret.
Methods to Summarize Data:
* Distribution: Data points are summarized by grouping those with the same score. This is visualized via frequency distribution tables or histograms. SPSS syntax is significant for examinations when generating these.
* Sample Statistics: Data is summarized using characteristic features of the distribution.
Characteristic Features of a Distribution
Central Tendency: This represents the most characteristic score of a distribution. The measures include:
* Mean: The sum of all scores divided by the total number of scores (N). Individual scores are denoted as x.
* Median: The middle score in a distribution.
* Mode: The most frequently occurring score.
Dispersion (Variance): This measures how much individual scores deviate from the central tendency. Measures include:
* Range: The difference between the highest and lowest scores.
* Variance (s2): Calculated as the sum of all squared deviance scores divided by the number of scores minus one:
s2=N−1∑(x−xˉ)2
* Standard Deviation (s): The square root of the variance:
s=s2
Inferential Statistics Principles
Context: Descriptive statistics are sufficient only when data encompasses the entire population. However, researchers almost always use samples because population data collection is too expensive, takes too long, or is physically impossible.
Goal: Inferential statistics allow researchers to draw conclusions about a population based on a representative sample.
The Three Procedures of Inferential Statistics:
1. Hypothesis testing.
2. Point estimation.
3. Interval estimation (Confidence Intervals).
Hypothesis Testing Logic and Procedures
Hypotheses Framework:
* Hypotheses must be exclusive (only one can be true) and exhaustive (all possibilities must be included).
* Null Hypothesis (H0): Usually assumes no effect or equality (e.g., H0:μ=2.5). It always contains the "=" sign.
* Alternative Hypothesis (H1/Ha): Contains the researcher's expectation (e.g., H1:μ=2.5 for two-sided; H1:μ>2.5 or H1:μ<2.5 for one-sided).
Decision Logic:
* Assume H0 is true. Determine the sampling distribution of the sample mean (xˉ).
* The sampling distribution mean is μ, and the variance is Nσ2.
* P-value: The probability that the observed sample statistic (xˉ) or a more extreme value occurs, assuming H0 is true.
* Rule: If p<α (significance level), the probability is too small to trust H0, so H0 is rejected.
* Rule: If p≥α, there is not enough evidence to doubt H0, so it is not rejected.
Steps in Hypothesis Testing:
1. Formulate H0 and H1.
2. Determine the decision rule (p<α).
3. Determine the p-value based on SPSS output.
4. Make the decision regarding significance and formulate a conclusion.
Assumptions: One-sample tests assume a "simple random sample," meaning all cases have an equal chance to be sampled and cases are selected independently.
Point and Interval Estimation
Point Estimation: Provides the "best guess" of a population parameter based on sample statistics.
* For the population mean μ, the best estimate is the sample mean xˉ.
* For the population variance σ2, the best estimate is the sample variance s2.
Interval Estimation (Confidence Intervals): Provides a range in which the parameter lies with a specific confidence (usually 95%).
* Formula for 95% CI of μ: xˉ±tCV×Ns
* Relation to Testing: If a hypothetical value (μ0) falls inside the 95% CI, H0 cannot be rejected at α=0.05. If it falls outside, H0 can be rejected in favor of a two-sided alternative.
* Interpretation: A 95% CI means that if 100 samples were drawn, 95 of them would produce a CI containing the true population mean. A Type I error occurs if the 5% of samples that do not contain μ lead to a rejection of H0.
Statistical Power and Errors
Types of Errors:
* Type I Error (α): Rejecting H0 when it is actually true.
* Type II Error (β): Failing to reject H0 when it is actually false.
Power (1−β): The probability of correctly rejecting the null hypothesis. Researchers prefer highly powered tests to ensure effects are detected when they exist.
Four Factors Influencing Power:
1. Significance Level (α): A larger α (e.g., 0.10 vs 0.05) increases the rejection region, making it easier to reject H0 and thus increasing power.
2. Sample Size (N): Increasing N decreases the standard error, which increases the z-score/t-score, making it easier to reject H0.
3. Population Standard Deviation (σ): If σ decreases, the standard error decreases, increasing power.
4. True Population Mean (μH1): A larger effect size (distance between μ0 and the true μ) increases power.
Effect Size Concepts and Formulas
Importance: Statistical significance (p<0.05) does not prove a systematic effect exists (could be Type I error) and does not mean the effect is practically or clinically relevant. Large samples can make even tiny, uninteresting differences significant.
Measures of Effect Size:
* Cohen's d: Standardizes the difference between group means.
* One-group formula: d=Nt
* Two-group formula: d=t×n11+n21
* (Partial) Explained Variance (η2): How much of the variance in the dependent variable (DV) is explained by group membership.
* Formula: η2=t2+dfwt2
* Degrees of Freedom (dfw) for two groups: dfw=n1+n2−2.
Rules of Thumb for Cohen's d:
* Small Effect: 58% of the experimental group scores higher than the average of the control group.
* Medium Effect: 69% of the experimental group scores higher than the control mean.
* Large Effect: 79% of the experimental group scores higher than the control mean.
One-Way Analysis of Variance (ANOVA)
Purpose: Compares means across K≥2 populations. It originated from Fisher's experimental designs.
Terminology:
* Between-subjects: Independent samples for each condition.
* Within-subjects: The same subjects are exposed to multiple conditions (Repeated Measures).
* Factors: Categorical independent variables.
* Levels: Categories within factors.
* Factorial Design: Designs with multiple factors (e.g., 3x3fully crossed factorial design has 9 conditions).
Hypotheses in ANOVA:
* H0:μ1=μ2=...=μk
* H1: Not H0 (at least one pair of means differs).
Inflated Type I Error: If multiple t-tests are used for K groups, the risk of at least one Type I error is: 1−(1−α)C, where C=2K×(K−1). ANOVA (the omnibus test) protects against this.
Logic and Numerical Modeling of ANOVA
Variance Decomposition: ANOVA splits total variance into variance between groups and variance within groups.
* Variance Between: Differences between grand mean and group means; explained by group membership.
* Variance Within (Residual/Error): Differences between observed scores and group means; cannot be explained by group membership; represents random variation.
Linear Additive Model:
Yik=μ+αk+ϵik
* Yik: Score of person i in group k.
* μ: Grand mean population parameter.
* αk: Group effect (μk−μ).
* ϵik: Residual error (Yik−μk).
F-Test Statistic:
F=MSWithinMSBetween
* MSBetween=dfBetweenSSBetween, where dfBetween=K−1.
* MSWithin=dfWithinSSWithin, where dfWithin=N−K.
* Under H0, we expect F=1. If F>1, it provides evidence against H0. The value is checked against the F-distribution using critical value F(df1,df2).
Assumptions of ANOVA
Independent Observations: Violated in repeated measures. ANOVA is not robust to this violation.
Normality of Residuals (ϵ): ANOVA is robust if nk≥30. If not, use Kruskal-Wallis.
Homogeneity of Variance: Tested via Levene's test. ANOVA is robust if the highest n is less than 1.5 times the lowest n. If not met, use Welch's test.
Quantitative Scores: The DV must be quantitative to allow mean computation.
No Outliers: This is a general requirement for all statistical methods.
Contrasts and Specific Group Comparisons
Defined: Specific comparisons between individual group means used to detect exactly where differences lie after a significant omnibus ANOVA.
Types:
* A priori (Planned): Specified before data collection/inspection. Preferred for hypothesis testing.
* Post hoc: Exploratory; used after a significant ANOVA.
Simple vs. Complex:
* Simple (Pairwise): Comparing two means (e.g., μ1=μ2).
* Complex: Comparing combined groups (e.g., μ1=2μ2+μ3).
Contrast Value (ψ):
ψ=c1μ1+c2μ2+c3μ3
* Restriction: The sum of contrast coefficients must be zero (∑ck=0).
* Unequal Sample Size Formula:
ck=∑nink×nk (adjusted for sample sizes on each side of the comparison).
Orthogonal Contrasts: Contrasts that are independent/uncorrelated.
* Condition for equal n: ∑(c1k×c2k)=0.
* A set of K−1 orthogonal contrasts can completely explain SSBetween.
Post-Hoc Comparison Methods
Tukey Honestly Significant Difference (HSD):
* Tests all possible pairs of means.
* Best for equal sample sizes.
* Uses the studentized range statistic q with critical value qCV. As K increases, qCV increases to control experiment-wise error.
Scheffé Test:
* Allows for complex contrasts.
* The most conservative post-hoc test.
* Critical value: (K−1)×FCV, where FCV is based on ANOVA degrees of freedom.
* If ANOVA is significant, at least one Scheffé contrast must be significant; if ANOVA is non-significant, no Scheffé contrast will be significant.
Bonferroni Correction: Used for planned multiple contrasts.
αBonferroni=Cα, where C is the number of contrasts.
Two-Way ANOVA and Interaction Effects
Defined: An ANOVA with two independent categorical variables (Factor A and Factor B). A factorial design (e.g., A x B fully crossed).
Hypotheses Tested:
1. Main Effect of A: Differences in means across levels of Factor A.
2. Main Effect of B: Differences in means across levels of Factor B.
3. Interaction Effect (A*B): Whether the effect of Factor A depends on the level of Factor B.
Interaction Visualization: Interaction exists if the lines in a profile plot are not parallel.
* Ordinal Interaction: Lines do not intersect; the order of group means remains consistent.
* Disordinal Interaction: Lines intersect; the order of group means changes across levels.
Simple Effects: If an interaction is significant, main effects can be deceiving. Researchers must perform simple effects analysis (one-way ANOVA for one factor at each individual level of the other factor).
Balanced vs. Unbalanced Designs:
* Balanced: Equal cell sizes (njk) or proportional cell sizes. $SS_{Between} = SS_A + SS_B + SS_{AB}$.
* Unbalanced: Cell sizes are not proportional. $SS_A, SS_B, $ and $SS_{AB}$ do not sum perfectly to $SS_{Between}$. Partial η2 is used as the effect size measure to show additional variance explained by a factor after accounting for others.
Analysis of Covariance (ANCOVA)
Defined: Analysis of variance used to compare group means while controlling for a continuous variable known as a covariate (X).
Goals:
1. Elimination of Bias: Correcting for mean differences on the covariate that existed before the experiment (critical in quasi-experiments).
2. Reduction of Error Variance: Removing variance in the DV explained by the covariate, thereby decreasing MSWithin and increasing the power of the test for the main factor.
Experimental vs. Quasi-experimental Contexts:
* Experiments: Random assignment usually makes group covariant means equal. ANCOVA primarily reduces error variance.
* Quasi-experiments: Groups often differ on the covariate initially. ANCOVA removes the bias of the covariate and reduces error variance.
Additional Assumptions for ANCOVA:
* The covariate is measured before the experimental manipulation.
* The covariate is measured without error (use reliable measures).
* The relationship between the covariate (X) and dependent variable (Y) is linear.
* Homogeneity of Regression Slopes: The relation between X and Y must be the same for all treatment levels (no interaction between the factor and the covariate).
Repeated Measures ANOVA (Within-Subjects Design)
Context: Used when for each subject, there is more than one measurement on the DV (e.g., longitudinal studies, test-retest, diary studies).
Subject as Own Control: This design acts like an ANCOVA by using the subject as a control to reduce error variance. It partitions out the variance attributable to individual subject differences (SSSubject).
Sphericity Assumption: An extension of the homogeneity of variance. It requires that the variances of the difference scores between any two levels of the within-subjects factor are equal.
* Violation: Tested via Mauchly's test. If violated, F-values are artificially inflated, and p-values are too small.
* Corrections: Greenhouse-Geisser or Huyn-Feldt corrections adjust the degrees of freedom via factor ϵ to correct the p-value. Lower-bound is the most extreme correction (ϵ=K−11).
Mixed-Effects Models: Often combines within-subjects factors (e.g., time) and between-subjects factors (e.g., education level). Analyses then test for main effects of both and the interaction between them.
Designing an Experiment: Validity and Reliability
Goal: Establish causal relationships between IV and DV via manipulation of the experimental group and use of a control group.
Internal Validity: The degree to which the relationship between the IV and DV reflects only that relationship, without contamination from confounding variables.
External Validity: The degree to which results generalize to other people, locations, and contexts.
Sampling Challenges:
* Simple Random Sample: Preferable for high external validity but difficult to obtain.
* Convenience Sample: Frequently used in psychology (often university students). Leads to volunteer bias or subject sophistication, threatening external validity.
Confounding Variables: Variables that differ between conditions (e.g., IQ in a teaching study). They decrease internal validity by offering alternative explanations and decrease power by increasing MSWithin.
Methods to Control Participant Variables
Random Assignment: Best for balancing unknown variables in large samples; less effective in small samples.
Systematic Balancing: Measuring a confound (e.g., IQ) before manipulation and ensuring groups are balanced by matching category counts.
Matched Group Design: Measuring a confound and pairing participants with identical scores across conditions. Highly effective but difficult to find matches for large groups.
Limiting the Population: Including only participants with specific values on a confound (e.g., only IQ 100-110). This increases internal validity and power but significantly reduces external validity.
Counterbalancing in Within-Subjects Designs: Used to control for order effects (practice, fatigue, carry-over, or response sets):
* Latin Square Design: A partial balance where every condition appears in each ordinal position, but sequence follows a specific pattern (e.g., B always follows A).
* Random Balance: Each participant receives a unique random order.
* Full Balance: All possible orders of conditions are presented across the sample. Requires a very large N.
Threats in WS Designs: Subject history (events outside the study), maturation (growing older/changing), and subject mortality (attrition or dropout before completion).