Psyc Stats Textbook Summary

The t-test, also known as the Student's t-test, is a statistical test used to determine if there is a significant difference between the means of two groups or samples. It is a simple test of significance used to make comparisons and determine significant differences between two samples or groups.
Example: A researcher investigates the effects of hypnosis on memory retrieval.
- Hypnotized participants recall 40% of studied words.
- Unhypnotized participants recall 38% of the words.
- It can be concluded that the 2 percent difference might not be statistically significant and could be due to sampling error.
As in the z-test, the hypothesis-testing process with the t-test follows the same basic logic. The core idea is to formulate a null hypothesis (no difference between the groups) and an alternative hypothesis (there is a difference), then use the t-test to see if the data support rejecting the null hypothesis.
One important difference between the z-test and the t-test is that in the former, population means and variances are known, while in the latter, population means and variances are unknown and samples are typically small. Because the population variance is typically unknown in real-world scenarios, the t-test is more flexible.
If the population variance is unknown, it is estimated from the sample. This introduces additional uncertainty, which is accounted for by using the t-distribution instead of the normal distribution.
The chapter examines:
- The t-test for a single sample, used for simple test designs. This test compares the mean of a single sample to a known population mean.
- The t-test for independent means, used when two different samples are being compared. This test determines whether there is a significant difference between the means of two independent groups.
- The t-test for dependent means, used when samples are related or correlated. This test is used when the data comes from related pairs (e.g., before and after measurements on the same subject).

The t-Test for a Single Sample

The t-test for a single sample or the one-sample t-test is designed to test hypotheses about population means and whether sample means are different from them. It is used to determine if the mean of a single sample is significantly different from a known or hypothesized population mean.
Depending on the research hypothesis, predicted differences could be directional (one-tailed test) or nondirectional (two-tailed test). A one-tailed test is used when the researcher has a specific prediction about the direction of the difference, while a two-tailed test is used when the researcher is only interested in whether there is any difference at all.
Example: Comparing a sample of well-educated professionals from a close-knit community to a population known for its extreme conspiratorial views.
- One hypothesis could be that the sample’s community closeness might prevent them from espousing extreme conspiratorial beliefs, predicting significantly lower conspiratorial belief scores. This is an example of a directional hypothesis, so a one-tailed test would be appropriate.
In general, the two tests of significance share similar formulas:
- t = \frac{\bar{X} - \mu}{S_{\bar{X}}}
- Z = \frac{\bar{X} - \mu}{\sigma_{\bar{X}}}
- Where:
  - \bar{X} = sample mean
  - \mu = population mean
  - \\sigma_{\bar{X}} = standard error of the mean
  - S_{\bar{X}} = estimated standard error of the mean.
One critical difference between these tests is that, unlike the z-test, the population variance, \sigma^2, for the t-test is unknown. Because the population variance is unknown, it must be estimated from the sample data.
If \sigma^2 is unknown, it is estimated from the sample, in which case, S^2 and s indicate the estimated nature of the population variance and standard deviation, respectively. These estimates are used in the calculation of the t-statistic.
The estimated standard error of the mean, S{\bar{X}}, is reflected in the denominator of the t-test formula, as opposed to the standard error of the mean, \sigma{\bar{X}}, for the z-test formula. The estimated standard error reflects the uncertainty introduced by estimating the population variance.
If \sigma^2 is unknown, and the only information we have is from the sample, we use the sample to estimate the population variance and the standard error of the mean. This is a common situation in research.
If sample selection and group assignment are performed at random, the sample is a good representation of the population whence it came. Random sampling helps to ensure that the sample is representative of the population, which is crucial for making valid inferences.

Estimating Population Variance

Representing rating responses (X) on a 5-point scale ranging from 1 (disagree) to 5 (agree):
- Column 1: rating responses (X). These are the individual data points collected in the sample.
- Column 2: the rating sample mean, \bar{X} = 4.13. Step 1. The sample mean is the average of the rating responses.
- Column 3: deviation from the mean score, X - \bar{X}. Step 2. This measures how far each individual data point is from the sample mean.
- Column 4: squared deviation from the mean, (X - \bar{X})^2. Step 3. Squaring the deviations ensures that all values are positive, which is necessary for calculating the variance.
Step 4: Total sum of column 4 = sum of squared deviations from the mean, SS = \sum(X - \bar{X})^2 = 25.7 (Sum of Squares). The sum of squares measures the total variability in the sample data.
Population variance/standard variance formula: SS / n. In this case, 25.7 / 15 = 1.71. It is a biased estimate because it underestimates the population variance, which is typically larger than the sample. This is because it doesn't account for the fact that the sample is only a subset of the population.
The unbiased estimate of the population variance is the sum of squares divided by the degrees of freedom (df) or n - 1.
- SS / df = SS / (n - 1) = 25.7 / (15 - 1) = 25.7 / 14 = 1.84.
Degrees of freedom are the number of values in a set that are free to vary. In this context, it reflects the number of independent pieces of information available to estimate the population variance. It is calculated as n-1 because one degree of freedom is lost when estimating the sample mean.
In a sample of fifteen individuals, 14 (or 15 − 1) of those data values are independent and free to vary. Once 14 values are known, the 15th value is determined because the sum of the deviations from the mean must equal zero.
Sum of the deviations from the mean would always be equal to zero (step 2). This is a mathematical property of the mean.
Taking the square root of the unbiased estimate of the population variance or sample variance estimate, S^2 (step 6), our unbiased estimate of the population standard deviation or sample standard deviation is S = \sqrt{S^2} = \sqrt{1.84} = 1.36. The sample standard deviation measures the spread of the data around the sample mean.

Estimated Standard Error of the Mean

The next step is to figure the estimated standard error of the sampling distribution of means, or more simply, the estimated standard error of the mean (S_{\bar{X}}) or estimated standard error. The standard error of the mean is a measure of the precision with which the sample mean estimates the population mean.
The estimated standard error is an estimate of the population standard error, \sigma_X = \sigma / \sqrt{n} (from Chapter 6), when it is unknown. Because the population standard deviation is typically unknown, the estimated standard error is used instead.
The formula for the estimated standard error is:
S_{\bar{X}} = \frac{S}{\sqrt{n}}

It represents the error (or sampling error), on average, between the population mean and its sample estimate. A smaller standard error indicates that the sample mean is a more precise estimate of the population mean.
From Table 8.1 (step 6), the sample standard deviation for the conspiracist belief scale ratings for fifteen highly educated individuals is 1.36.
Plugging this information, S = 1.36, n = 15, into the formula, the estimated standard error is 0.35:
Sample standard deviation of 1.36: S_{\bar{X}} = \frac{1.36}{\sqrt{15}} = \frac{1.36}{3.87} = 0.35

Mean Differences

The t-test is the difference between the sample mean and the population mean divided by the estimated standard error. It quantifies the difference between the sample mean and the population mean in terms of standard error units.
It is a ratio indicating the number of standard error units from the mean: t = \frac{\bar{X} - \mu}{S_{\bar{X}}}
Interested in a difference between means that is a result of a treatment effect or experimental manipulation and not simply due to sampling error or chance. The goal is to determine whether the observed difference is likely due to a real effect or simply due to random variation.
For our hypothesis, we want to compare the conspiratorial average ratings of the well-educated professionals from the closely knit community to the \mu of the well-educated professionals known for their extreme views. This comparison will help us determine whether the community closeness influences conspiratorial beliefs.
Using Table 8.1; sample mean, \bar{X} = 4.13; estimated standard error, S_{\bar{X}} = 0.35; and \mu = 5, from the population with extreme conspiratorial views, we can now figure our t-score for our sample mean:
t = \frac{4.13 - 5}{0.35} = \frac{-0.87}{0.35} = -2.49

The t-Distribution

The t-distribution is essentially an estimate of the normal distribution discussed in Chapters 5 and 6. It is used when the population variance is unknown and estimated. The t-distribution is used because it accounts for the additional uncertainty introduced by estimating the population variance.
It is used when the population variance is unknown and estimated.
Like the normal curve, the t-distribution is unimodal and symmetrical, but flatter and with heavier tails. The flatter shape and heavier tails reflect the greater uncertainty associated with estimating the population variance.
The exact shape of the t-distribution varies according to the degrees of freedom. As the degrees of freedom increase, the t-distribution approaches the normal distribution.
As with the z-test, observed or computed t-scores are expected to be less than −1 or greater than +1 for statistically significant results. The exact cutoff values depend on the chosen significance level (alpha) and the degrees of freedom.

Interpreting the t-Distribution

Table 8.3 contains critical values or t-scores of the t-statistic which must be equaled or exceeded by the t-score of the sample data to be significant at the appropriate level of significance shown across the top of the table. These critical values are used to determine whether to reject the null hypothesis.

Same or Different Populations

Statistical inference is the process of saying something about a population of elements based on a sample drawn from that population. The goal is to generalize findings from the sample to the larger population.
The sample mean, \bar{X}, is considered the best estimate of the hypothesized population mean, µ. However, it is important to remember that the sample mean is only an estimate and may not be exactly equal to the population mean.

The Hypothesis-Testing Process for the One-Sample t-Test

Conceptualize the research question of interest into null (H0) and research (H1) hypotheses about the populations involved. The null hypothesis typically states that there is no difference between the sample mean and the population mean, while the research hypothesis states that there is a difference.
Identify the characteristics of the comparison distribution. This includes determining the degrees of freedom and the shape of the t-distribution.
Set up the critical cutoff sample score for our decision to reject the null hypothesis. This cutoff score is determined based on the chosen significance level (alpha) and the degrees of freedom.
Data and results from the study of highly educated individuals from a close-knit community are summarized in Table 8.1. Compute the sample’s t-score. The t-score is calculated using the formula described earlier.
Compare the computed sample’s z-score from step 4 to the critical cutoff sample z-score of step 3 and decide whether to reject the null hypothesis. If the calculated t-score exceeds the critical value, the null hypothesis is rejected.

Effect Size

Compute the effect size, we essentially use the same formula as in Chapter 7. Effect size measures the magnitude of the treatment effect, independent of sample size. We modify the Cohen’s d formula by substituting the population standard deviation, \sigma, by the sample standard deviation, s, in the denominator:
d = \frac{\bar{X} - \mu}{S}
- Where \mu = Population mean and \bar{X} = Sample mean. Cohen's d is a common measure of effect size that expresses the difference between two means in terms of standard deviation units.

Statistical Power

Statistical power (1−β) is the probability of a statistical test in rejecting a false null hypothesis. Power depends on several factors, including the sample size, the effect size, and the significance level (alpha). A higher power means there is a greater chance of detecting a true effect.

Assumptions of the One-Sample t-Test

The assumptions of the one-sample t-test include (1) the interval or ratio level of measurement underlying the characteristic measured and (2) an approximately normal distribution associated with the observed scores or measurements. Violations of these assumptions can affect the validity of the test results.

The t-Test for Independent Means

The t-test for independent means or the t-test for differences between means is a hypothesis-testing procedure used to compare studies between two groups with different sets of scores. It is used to determine whether there is a significant difference between the means of two independent groups.
The assumption of normally distributed population applies to all t-tests. This assumption means that the data in each group should be approximately normally distributed.
An experimental design associated with the t-test for independent means is the between-subjects designs. Participants are only part of one condition or group. In a between-subjects design, each participant is only exposed to one level of the independent variable.

Estimating Population Variance

Sample pooled variance is the weighted average of the samples’ variances, accounting for differences in sample size or degrees of freedom. It provides a single estimate of the population variance based on the data from both samples.
The averaged or combined sample variance for the t-test for independent means is the pooled estimate of the population variance (SPooled^2) or sample pooled variance for short.
S\text{Pooled{^2}} = \frac{df1 S1^2 + df2 S2^2}{df\text{Total}}

Estimated Standard Error of the Difference between Two Means

The estimated standard error of the difference between two means (or estimated standard error, for short), denoted as SDifference, functions in precisely the same way as does the standard error of the mean for a single sample. It measures the precision with which the difference between the two sample means estimates the true difference between the population means.
S\text{Difference} = \sqrt{\frac{S\text{Pooled}^2}{n1}+\frac{S\text{Pooled}^2}{n_2}}