Stats Test 3
I. Repeated-Measures t-Test
A. Concepts and Definitions
Repeated-Measures Design: A research design where the same group of participants is measured on the same dependent variable at two different time points or under two different conditions. Each participant provides a pair of scores.
Difference Score (D): The score obtained by subtracting one of the paired measurements from the other for each individual participant (D = X₂ - X₁). The analysis in a repeated-measures t-test is performed on these difference scores.
Null Hypothesis (H₀): For a repeated-measures t-test, the null hypothesis typically states that there is no consistent or systematic difference between the two measurement occasions or conditions in the population (µ<0xE2><0x82><0x90> = 0).
Alternative Hypothesis (H₁): The alternative hypothesis states that there is a systematic difference between the two measurement occasions or conditions in the population, resulting in a non-zero mean difference (µ<0xE2><0x82><0x90> ≠ 0 for a two-tailed test; µ<0xE2><0x82><0x90> > 0 or µ<0xE2><0x82><0x90> < 0 for a one-tailed test).
Estimated Standard Error of M<0xE2><0x82><0x90> (s<0xE2><0x82><0x90>_M): An estimate of the standard deviation of the sampling distribution of mean difference scores. It is calculated as the sample standard deviation of the difference scores (s<0xE2><0x82><0x90>) divided by the square root of the number of participants (√n).
Repeated-Measures t-Statistic: A statistic used to test hypotheses about the population mean difference (µ<0xE2><0x82><0x90>) based on the sample mean difference (M<0xE2><0x82><0x90>) and the estimated standard error of M<0xE2><0x82><0x90>. The formula is t = (M<0xE2><0x82><0x90> - µ<0xE2><0x82><0x90>) / s_M<0xE2><0x82><0x90>, which simplifies to t = M<0xE2><0x82><0x90> / s_M<0xE2><0x82><0x90> under the null hypothesis (µ<0xE2><0x82><0x90> = 0).
Degrees of Freedom (df): For the repeated-measures t-test, the degrees of freedom are calculated as the number of participants minus one (df = n - 1). These are used to determine the critical t-value.
Testing Effects: A disadvantage of repeated-measures designs where exposure to the first condition influences performance in the second condition (e.g., practice, fatigue).
Floor Effects: A disadvantage where participants' scores in the first condition are so low that they can only increase in the second condition, regardless of the treatment effect.
Ceiling Effects: A disadvantage where participants' scores in the first condition are so high that they can only decrease in the second condition, regardless of the treatment effect.
Effect Size (Cohen's d): A measure of the magnitude of the treatment effect, calculated as Cohen’s d = M<0xE2><0x82><0x90> / s<0xE2><0x82><0x90>.
B. Advantages and Disadvantages
Advantages:Requires fewer participants compared to independent-measures designs.
Eliminates individual differences as a source of variability, reducing error variance and increasing statistical power.
Well-suited for examining changes over time (e.g., learning, development).
Disadvantages:Susceptible to testing effects (practice, fatigue, carry-over effects).
Potential for floor and ceiling effects to skew results.
C. Hypothesis Testing Procedure
State the null (H₀: µ<0xE2><0x82><0x90> = 0) and alternative (H₁: µ<0xE2><0x82><0x90> ≠ 0 or directional) hypotheses and select the alpha (α) level.
Determine the degrees of freedom (df = n - 1) and locate the critical region using the t-distribution table based on the chosen alpha level and degrees of freedom.
Compute the difference score (D) for each participant, calculate the sample mean difference (M<0xE2><0x82><0x90>), and the sample standard deviation of the difference scores (s<0xE2><0x82><0x90>). Then, calculate the estimated standard error of the mean difference (s_M<0xE2><0x82><0x90> = s<0xE2><0x82><0x90> / √n) and the repeated-measures t-statistic (t = M<0xE2><0x82><0x90> / s_M<0xE2><0x82><0x90>).
Make a decision by comparing the obtained t-statistic to the critical t-value. If the absolute value of the obtained t-statistic is greater than or equal to the critical t-value, reject the null hypothesis.
Interpret the results in the context of the research question and, if appropriate, calculate and interpret the effect size.
II. Introduction to Analysis of Variance (ANOVA)
A. Concepts and Definitions
Analysis of Variance (ANOVA): A hypothesis testing procedure used to evaluate mean differences between two or more populations. It examines the variance within and between groups.
Factor: The independent variable that defines the groups being compared.
Levels: The individual conditions or values that make up a factor. The number of levels is denoted by k.
F-ratio: The test statistic for ANOVA, calculated as a ratio of two variance estimates: F = MS<0xE2><0x82><0x90><0xE2><0x82><0x90>_between / MS_within.
Mean Square Between Groups (MS<0xE2><0x82><0x90><0xE2><0x82><0x90>_between): An estimate of the variance between the sample means. It measures the variability caused by differences between the groups (and potentially the treatment effect). Calculated as SS_between / df_between.
Mean Square Within Groups (MS_within): An estimate of the variance within each of the samples. It measures the variability that is not due to differences between the groups (error variance). Calculated as SS_within / df_within.
Sum of Squares Between Groups (SS_between): A measure of the total variability between the group means and the grand mean.
Sum of Squares Within Groups (SS_within): A measure of the total variability within each of the treatment conditions.
Sum of Squares Total (SS_total): A measure of the total variability in the entire dataset. SS_total = SS_between + SS_within.
Degrees of Freedom Between Groups (df_between): Calculated as the number of levels minus one (df_between = k - 1).
Degrees of Freedom Within Groups (df_within): Calculated as the total number of participants minus the number of levels (df_within = N - k).
Null Hypothesis (H₀): In ANOVA, the null hypothesis states that all the population means are equal (µ₁ = µ₂ = µ₃ = ... = µ<0xE1><0xB5><0x8F>).
Alternative Hypothesis (H₁): The alternative hypothesis states that there is at least one mean difference among the populations. ANOVA does not specify which means are different.
Post Hoc Tests: Statistical tests conducted after a significant ANOVA result to determine which specific group means are significantly different from each other (e.g., Tukey's HSD, Scheffé's test).
Effect Size (η² - Eta-squared): A measure of the proportion of variance in the dependent variable that is accounted for by the independent variable. Calculated as η² = SS_between / SS_total.
B. Logic of ANOVA
ANOVA works by partitioning the total variability in the data into different sources of variation. If the independent variable has an effect, the variability between the group means should be larger than the variability within the groups, resulting in a large F-ratio and rejection of the null hypothesis.
C. Hypothesis Testing Procedure
State the null (H₀: µ₁ = µ₂ = ... = µ<0xE1><0xB5><0x8F>) and alternative (H₁: at least one mean is different) hypotheses and select the alpha (α) level. ANOVA hypotheses are always non-directional.
Calculate the degrees of freedom between groups (df_between = k - 1) and within groups (df_within = N - k). Locate the critical region using the F-distribution table with the determined degrees of freedom and alpha level.
Compute the sums of squares (SS_total, SS_within, SS_between). Calculate the mean squares (MS_between = SS_between / df_between, MS_within = SS_within / df_within). Compute the F-ratio (F = MS_between / MS_within).
Make a decision by comparing the obtained F-ratio to the critical F-value. If the obtained F-ratio is greater than or equal to the critical F-value, reject the null hypothesis.
If the null hypothesis is rejected, conduct post hoc tests to determine which specific means differ. Calculate and interpret the effect size (η²).
III. Two-Factor ANOVA
A. Concepts and Definitions
Two-Factor ANOVA: An ANOVA procedure examining the effects of two independent variables (factors) on a single dependent variable.
Main Effect: The effect of one independent variable on the dependent variable, averaged across the levels of the other independent variable. In a two-factor ANOVA, there is a main effect for Factor A and a main effect for Factor B.
Interaction Effect: Occurs when the effect of one independent variable on the dependent variable depends on the level of the other independent variable. The combined effect of the two factors is different from what would be predicted by their main effects alone.
Cell: A specific combination of the levels of the two factors in a two-factor design.
Sum of Squares for Factor A (SS_A): The variability attributed to the main effect of Factor A.
Sum of Squares for Factor B (SS_B): The variability attributed to the main effect of Factor B.
Sum of Squares for the Interaction (SS_AxB): The variability attributed to the interaction between Factor A and Factor B.
Degrees of Freedom for Factor A (df_A): Number of levels of Factor A minus 1.
Degrees of Freedom for Factor B (df_B): Number of levels of Factor B minus 1.
Degrees of Freedom for the Interaction (df_AxB): (df_A) * (df_B).
B. Logic of Two-Factor ANOVA
Two-factor ANOVA allows researchers to examine not only the individual effects of two independent variables but also their combined or interactive effects on the dependent variable. It partitions the variance into main effects for each factor and an interaction effect.
C. Hypothesis Testing Procedure
State the null and alternative hypotheses for each of the three effects being tested:
Main effect of Factor A (H₀: no main effect; H₁: there is a main effect).
Main effect of Factor B (H₀: no main effect; H₁: there is a main effect).
Interaction effect (H₀: no interaction; H₁: there is an interaction). Select the alpha (α) level.
Determine the degrees of freedom for each effect (df_A, df_B, df_AxB) and the degrees of freedom within groups (df_within). Locate the critical F-values for each test using the F-distribution table with the appropriate degrees of freedom and alpha level.
Compute the sums of squares (SS_total, SS_within, SS_A, SS_B, SS_AxB). Calculate the mean squares (MS_A = SS_A / df_A, MS_B = SS_B / df_B, MS_AxB = SS_AxB / df_AxB, MS_within = SS_within / df_within). Compute the F-ratios for each effect (F_A = MS_A / MS_within, F_B = MS_B / MS_within, F_AxB = MS_AxB / MS_within).
Make a decision for each hypothesis by comparing the obtained F-ratio to the corresponding critical F-value. Reject the null hypothesis if the obtained F is greater than or equal to the critical F.
Interpret the results. If there is a significant interaction, the main effects should be interpreted with caution or by examining simple main effects. If there is no significant interaction, the main effects can be interpreted independently.
Repeated-Measures t-Test and ANOVA Quiz
What is the primary purpose of using a repeated-measures t-test? Briefly explain why it is advantageous in certain research situations.
Explain the concept of a difference score in the context of a repeated-measures t-test. How are these scores used in the subsequent analysis?
State the typical null and alternative hypotheses for a two-tailed repeated-measures t-test. What does the alternative hypothesis suggest about the population?
What are two potential disadvantages of using a repeated-measures design? Provide a brief example for each.
In the formula for the repeated-measures t-statistic, what does the numerator typically represent under the null hypothesis, and what does the denominator represent?
What is the main goal of Analysis of Variance (ANOVA)? How does it differ from a t-test, and what advantage does it offer?
Define the terms "factor" and "levels" as they are used in the context of ANOVA. Provide a brief example to illustrate these terms.
Explain the concept of the F-ratio in ANOVA. What does a large F-ratio suggest about the differences between group means?
What is the purpose of conducting post hoc tests after obtaining a significant F-ratio in ANOVA? Give an example of a common post hoc test.
What is the key difference between a main effect and an interaction effect in a two-factor ANOVA? Explain what a significant interaction suggests about the influence of the independent variables.
Repeated-Measures t-Test and ANOVA Quiz - Answer Key
The primary purpose of a repeated-measures t-test is to evaluate the mean difference between two sets of scores obtained from the same group of individuals. It is advantageous when researchers want to examine changes within subjects over time or across different conditions, as it controls for individual differences.
A difference score in a repeated-measures t-test is the result of subtracting one measurement from the other for each participant. These difference scores (D = X₂ - X₁) become the data that are analyzed to determine if there is a significant average difference between the two conditions.
The typical null hypothesis (H₀) for a two-tailed repeated-measures t-test states that there is no significant difference between the two population means (µ<0xE2><0x82><0x90> = 0). The alternative hypothesis (H₁) states that there is a significant difference between the two population means (µ<0xE2><0x82><0x90> ≠ 0).
Two potential disadvantages of repeated-measures designs are testing effects and floor/ceiling effects. An example of a testing effect is improved performance on a second memory test due to practice from the first test. A floor effect occurs if initial scores are very low, limiting potential decreases in a second condition.
In the repeated-measures t-statistic formula, the numerator (M<0xE2><0x82><0x90> - µ<0xE2><0x82><0x90>) under the null hypothesis (µ<0xE2><0x82><0x90> = 0) typically represents the observed sample mean difference. The denominator (s_M<0xE2><0x82><0x90>) represents the estimated standard error of the mean difference, reflecting the variability expected in sample mean differences due to chance.
The main goal of ANOVA is to compare the means of two or more populations to determine if there are any statistically significant differences among them. Unlike a t-test, which is typically limited to two groups, ANOVA can simultaneously analyze differences between multiple group means, reducing the risk of Type I errors associated with multiple pairwise comparisons.
A "factor" in ANOVA is an independent variable that is manipulated or observed to determine its effect on the dependent variable (e.g., type of therapy). "Levels" are the specific values or categories of a factor (e.g., cognitive-behavioral therapy, psychodynamic therapy, control group).
The F-ratio in ANOVA is the ratio of the variance between the sample means (MS_between) to the variance within the samples (MS_within). A large F-ratio (significantly greater than 1) suggests that the variability between the group means is larger than what would be expected by chance, indicating a likely significant difference between the population means.
Post hoc tests are conducted after a significant ANOVA result to identify which specific pairs of group means are significantly different from each other, as ANOVA only indicates that at least one mean difference exists. An example of a common post hoc test is Tukey's Honestly Significant Difference (HSD) test.
A main effect in a two-factor ANOVA refers to the independent effect of one factor on the dependent variable, averaged across the levels of the other factor. An interaction effect occurs when the effect of one factor on the dependent variable depends on the specific level of the other factor, suggesting that the factors' combined influence is not simply additive.
Essay Format Questions
Compare and contrast the experimental designs for a repeated-measures t-test and an independent-measures t-test. Discuss the advantages and disadvantages of each design in terms of controlling for individual differences and potential confounding variables.
Explain the logic behind the F-ratio in ANOVA. How do the concepts of variance between groups and variance within groups contribute to the interpretation of the test statistic and the decision-making process regarding the null hypothesis?
Describe the steps involved in conducting a hypothesis test using the repeated-measures t-statistic. Be sure to include the formulation of hypotheses, determination of the critical region, calculation of the test statistic, and the decision-making process.
Discuss the interpretation of results in a two-factor ANOVA, paying particular attention to the difference between main effects and interaction effects. How does a significant interaction influence the interpretation of the main effects? Provide an example of a study where an interaction might be expected.
Imagine a research scenario where you want to compare the effectiveness of three different teaching methods on student test scores. Outline the appropriate statistical analysis you would use, justify your choice, and describe the steps you would take to conduct and interpret the results of this analysis, including any necessary follow-up tests.
Glossary of Key Terms
Alpha Level (α): The probability of making a Type I error (rejecting a true null hypothesis). Often set at .05.
Alternative Hypothesis (H₁): A statement that contradicts the null hypothesis, proposing that there is a significant effect or difference.
ANOVA (Analysis of Variance): A statistical test used to compare the means of two or more groups.
Between-Groups Variability: Differences in scores that exist between the different treatment groups or conditions.
Ceiling Effect: A limitation in which scores are clustered at the high end of the measurement scale, preventing detection of further increases.
Cohen's d: A measure of effect size for t-tests, indicating the standardized difference between two means.
Critical Region: The range of values for a test statistic that would lead to the rejection of the null hypothesis.
Degrees of Freedom (df): The number of independent pieces of information available to estimate a parameter.
Dependent Samples: Samples in which the observations in one sample are related to or paired with the observations in the other sample (e.g., repeated measures).
Difference Score (D): The value obtained by subtracting one score in a pair from the other.
Effect Size: A statistical measure that indicates the magnitude of a treatment effect or the strength of a relationship.
Estimated Standard Error: An estimate of the standard deviation of a sampling distribution.
F-ratio: The test statistic in ANOVA, calculated as the ratio of MS_between to MS_within.
Factor: An independent variable in ANOVA.
Floor Effect: A limitation in which scores are clustered at the low end of the measurement scale, preventing detection of further decreases.
Grand Mean (G): The overall mean of all the scores in a study.
Independent Samples: Samples in which the observations in one sample are not related to the observations in the other sample.
Interaction Effect: In two-factor ANOVA, the combined effect of two independent variables that is different from the sum of their individual main effects.
Levels: The different values or categories of an independent variable (factor) in ANOVA.
Main Effect: The overall effect of one independent variable on the dependent variable, averaging across the levels of other independent variables.
Mean Square (MS): An estimate of variance, calculated by dividing the sum of squares by its corresponding degrees of freedom.
Null Hypothesis (H₀): A statement of no effect or no difference, which the researcher aims to disprove.
Post Hoc Tests: Statistical tests conducted after a significant ANOVA to determine which specific group means differ significantly.
Power: The probability of correctly rejecting a false null hypothesis (avoiding a Type II error).
Repeated-Measures Design: A research design where the same participants are measured multiple times.
Repeated-Measures t-Test: A statistical test used to compare two means from a within-subjects design.
Standard Deviation: A measure of the variability or dispersion of a set of data points around their mean.
Standard Error of the Mean (SM): The standard deviation of the sampling distribution of the mean.
Sum of Squares (SS): A measure of the total variability within a set of data.
t-distribution: A probability distribution that is used when the population standard deviation is unknown and estimated from the sample.
t-statistic: A test statistic used in t-tests to determine if there is a significant difference between sample means and a population mean or between two sample means.
Testing Effects: Changes in a participant's performance due to repeated exposure to the experimental conditions.
Type I Error: Rejecting the null hypothesis when it is actually true (false positive).
Type II Error: Failing to reject the null hypothesis when it is actually false (false negative).
Variance: A measure of the average squared deviation of scores from their mean.
Within-Groups Variability: Differences in scores that exist within each of the treatment groups or conditions in ANOVA.