Introduction to t-tests
The t-test is an inferential statistical test determining significant differences between two group means, or a sample mean and a hypothesized population mean. It's essential when the population standard deviation is unknown and sample sizes are small (N < 30).
Normal Distribution: A statistical concept where data values are symmetrically distributed around the mean, forming a bell-shaped curve. Many natural phenomena and sample statistics exhibit this distribution.
Paranormal Distribution: Not a standard statistical term; likely refers to non-normal distributions. Data normality is crucial for t-tests and other parametric tests.
Learning Objectives
These objectives guide understanding and application of t-tests:
7-1: Conduct a t-test for a single sample to compare a sample mean to a known or hypothesized population mean.
7-2: Conduct a t-test for dependent means (paired-samples t-test) using difference scores to compare two related samples.
7-3: Outline the main assumptions required for valid application of t-tests.
7-4: Calculate and interpret effect size for a t-test for dependent means.
7-5: Calculate and interpret the confidence interval for a t-test for dependent means.
7-6: Explain the advantages and disadvantages associated with repeated measures designs.
7-7: Summarize t-test results effectively in research reports following standard guidelines.
Logic of t-test
The t-test compares observed sample data against null hypothesis expectations by assessing the ratio of the difference between means (or a mean and a hypothesized value) to data variability. This ratio yields a t-statistic to determine the probability of such a difference if the null hypothesis were true.
Compares sample mean with population mean: In a one-sample t-test, the statistic shows the sample mean's distance from the hypothesized population mean in standard errors.
Determines if population mean equals a set value: This forms the null hypothesis in a single-sample t-test (e.g., H_0: \mu = \text{specific value}).
Assumptions: For valid t-test results:
Homogeneity of variance: Assumes equal population variances for independent samples t-tests (less critical for one-sample or paired).
Interval or ratio data: Dependent variable must be on a continuous scale for mean/SD calculation.
No significant outliers: Outliers unduly influence means and variance, potentially leading to incorrect conclusions; address them.
Samples as Representatives of Populations
Sample statistics (e.g., mean, variance) infer population parameters. However, samples rarely capture a population's full variability.
Variance of sample indicates population variance: A sample's variance is typically a biased estimate of population variance.
Sample variance typically underestimates population variance: Samples are less variable than populations. Using the sample mean to calculate variance (instead of the unknown population mean) inherently yields a smaller estimate.
Estimating Population Variance
To correct for sample variance underestimation, 'degrees of freedom' are used when estimating population variance from a sample.
Biased formula: Sample variance (descriptive) is SD^2 = \frac{SS}{N}, where SS is the sum of squared deviations from the mean.
Degrees of freedom (df): The number of values in a statistic's calculation free to vary. For sample variance, df = N - 1. One df is lost because the sample mean estimates the population mean in SS calculation.
Corrected (unbiased) formula: The unbiased population variance estimate, S^2, is S^2 = \frac{SS}{df}. This yields a slightly larger, more accurate estimate of true population variability.
t Distribution
Student's t-distribution is a probability distribution used for estimating a normally distributed population mean when sample size is small and population standard deviation is unknown. It's symmetrical and bell-shaped, like the normal distribution, but with heavier tails.
Unique for each degree of freedom (df): The t-distribution's shape varies with df. Smaller df means fatter tails, indicating more uncertainty and greater probability of extreme values from smaller samples.
Closer to normal curve with greater df: As df (sample size) increases, the t-distribution approaches a standard normal (Z) distribution. Larger samples provide more reliable estimates of population variance.
Infinite df equates to normal curve: With infinite degrees of freedom, the t-distribution becomes identical to the normal distribution.
t Test for a Single Sample
A one-sample t-test determines if a sample mean significantly differs from a known or hypothesized population mean when the population standard deviation is unknown.
Compares unknown population variance using sample mean: The test estimates the standard error of the mean from sample data. The single-sample t-statistic is: t = \frac{M - \text{predicted } \bar{X}}{SM}, where M is the sample mean, \text{predicted } \bar{X} is the hypothesized population mean, and SM is the estimated standard error of the mean (S_M = \frac{S}{\sqrt{N}} where S is the estimated population standard deviation and N is the sample size).
Hypothesis testing involved with calculated t score: The calculated t-score is compared to a critical t-value (based on df and significance level) or used to find a p-value, to decide on rejecting the null hypothesis.
Writing t-test Results
Reporting t-test results in academic contexts requires following established guidelines (e.g., APA) for clarity and completeness.
Format: A typical format:
"A one-sample t-test was performed to compare the sample mean to [hypothesized value]. The sample mean (M = [mean value], SD = [standard deviation]) was significantly [higher/lower/different] than the hypothesized value, t([df]) = [t-value], p = [p-value]."Include crucial information: Report sample mean (M), sample standard deviation (SD), calculated t-statistic with degrees of freedom (t(df)), and the p-value. The p-value indicates the probability of observing data (or more extreme) if the null hypothesis were true; e.g., p < .05 signifies statistical significance.
Paired Samples t-Test
The paired samples t-test (dependent samples or repeated measures t-test) is used when each participant provides two scores, as in 'before and after' studies or with matched individuals.
Used with two scores per person: Appropriate when observations are dependent, from the same individual or naturally paired.
Assumes population mean of differences = 0: The null hypothesis is that the true population mean of difference scores between related measurements is zero (H0: \mu{diff} = 0).
Conducts hypothesis testing on difference scores: Involves calculating a 'difference score' for each pair (e.g., Post - Pre) and performing a t-test on these differences, comparing their mean to zero.
Assumptions of t Tests
Adhering to these assumptions ensures accurate t-test results, though minor deviations may be tolerated:
Interval or Ratio Level Data: The dependent variable must be on a continuous scale (e.g., temperature, height) for meaningful difference interpretation.
Random Sampling: Data must come from a random population sample, ensuring representativeness and no systematic bias.
Independence of Observations: Observations within each group (for most t-tests) must be independent. For paired t-tests, pairs must be independent, even if observations within a pair are dependent.
Normal Population Distribution: The population from which samples are drawn (or difference scores for paired t-tests) should be approximately normally distributed.
Robust to moderate violations: T-tests yield reasonably accurate p-values even with some non-normality, especially with large sample sizes (N > 30) due to the Central Limit Theorem. Severe non-normality or small samples can cause unreliable results.
Effect Size for Dependent Means
Effect size quantifies the magnitude (practical importance) of an observed effect, beyond statistical significance (p-value). For dependent means, Cohen's d is common.
Calculated using mean of differences: For a paired samples t-test, Cohen's d is d = \frac{M{diff}}{SD{diff}}, the mean of difference scores divided by their standard deviation. This provides a standardized, sample-size-independent effect measure.
Conventions for Cohen's d:
d = .2: Small effect (0.2 standard deviations mean difference).
d = .5: Medium effect (0.5 standard deviations mean difference).
d = .8: Large effect (0.8 standard deviations mean difference). Larger d indicates a more substantial difference.
Confidence Intervals for Dependent Means
Confidence intervals (CIs) provide a range likely containing the true population parameter (e.g., mean difference) at a specified confidence level. They offer more information than p-values by showing estimate precision.
Estimate standard error: First, estimate the standard error of the mean of differences (SE{M{diff}}): SE{M{diff}} = \frac{SD_{diff}}{\sqrt{N}}, where SD_{diff} is the standard deviation of difference scores and N is the number of pairs.
Determine number of standard errors for desired confidence level: For a given confidence level (e.g., 95%, 99%) and degrees of freedom (df = N - 1), obtain a critical t-value (t{critical}) from the t-distribution. This t{critical} indicates how many standard errors from the mean are needed to capture the desired percentage of the distribution.
Calculation: CI = M{diff} \pm (t{critical} \times SE{M{diff}}), where M_{diff} is the sample mean of the differences.
Advantages and Disadvantages of Repeated Measures
Repeated measures designs (analyzed by paired samples t-tests) use the same participants across all conditions, offering benefits and challenges.
Advantages:
Higher power: Controlling for individual differences (a major source of error in between-subjects designs) reduces error variance, increasing test sensitivity and power to detect true effects.
Low standard deviation for difference scores: Participants serve as their own control, leading to smaller variability in difference scores and thus higher power.
Requires fewer participants: Increased power allows similar statistical power with fewer participants than independent samples designs, enhancing resource efficiency.
Disadvantages:
Weak without control group: Repeated measures are prone to order effects, which can confound results if not mitigated:
Practice effects: Improved performance in later conditions due to familiarity.
Fatigue effects: Decreased performance in later conditions due to boredom or tiredness.
Carryover effects: One condition's effect influences subsequent conditions.
Experimental Demand: Participants may guess the hypothesis, causing biased responses.
Practical limitations: Unsuitable for some research (e.g., irreversible interventions). Non-random participant attrition can also be an issue.
Reporting t Test Results
Comprehensive t-test reporting includes both descriptive (means, SDs) and inferential statistics (t-value, df, p-value, effect size, CIs) for full understanding of findings.
More frequent for dependent t-tests: Detailed reporting is crucial for paired samples t-tests to clarify 'before/after' or 'condition A/B' contexts, though applicable to all t-tests.
Include crucial statistics: Always present means and standard deviations for each group/condition. Follow with the t-statistic, its degrees of freedom, exact p-value (or p < .05), and if calculated, effect size (e.g., Cohen's d) and confidence interval for the mean difference.
Example reporting format: For a paired samples t-test:
"A paired-samples t-test was conducted to compare [Variable 1] and [Variable 2]. There was a significant difference in scores for [Variable 1] (M = [Mean1], SD = [SD1]) and [Variable 2] (M = [Mean2], SD = [SD2]); t([df]) = [t-value], p = [p-value], d = [Cohen's d]. The 95% confidence interval for the mean difference was [[Lower Bound], [Upper Bound]]."