Z-test and t-test Notes

Z-test and t-test: Core Ideas

Overview: In practice, we look at the p-value for the test statistic when applying z-tests or t-tests in software like JASP. Several effect-size measures can be reported, e.g. Pearson's correlation coefficient, Cohen's d, omega, or omega-squared.
One simple example setup (z-test):
- Population of interest: all clinical IP scores with population mean $\mu$ (unknown) and population standard deviation known as part of the example: $\sigma = 15$ .
- Small sample: n = 25 students from a class, sample mean $\bar{x} = 110$ .
- Null hypothesis: $H_0: \mu = 100$ (population mean of IP scores assumed to be 100 under the null).
- Goal: test whether the sample provides evidence that the population mean differs from 100.
- Rationale: The z-test compares the sample mean to the population mean using knowledge of the population standard deviation.
- Example summary: With the numbers above, the z-statistic would be computed to evaluate whether the observed sample mean could arise if $\mu = 100$ .
How to compute the z-statistic (for the sample mean):
- For a single observation x: the z-score is $z = \frac{x - \mu}{\sigma}$
- For a sample mean, assuming known population standard deviation, the distribution of the sample mean is centered at $\mu$ with standard deviation $\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}$ , so
- Z-statistic for the sample mean: $z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}$
- Under the null, $z$ follows the standard normal distribution $N(0,1)$ .
- In the example: with $\bar{x} = 110$ , $\mu0 = 100$ , $\sigma = 15$ , and $n = 25$ , $z = \frac{110 - 100}{15 / \sqrt{25}} = \frac{10}{3} \approx 3.33$ which yields a very small p-value (two-sided, about $p \approx 0.0009\$). If the chosen alpha is 0.05, we would reject$ H0 $.</li></ul></li><li>Hypotheses, alpha, and decision regions (two-tailed example):<ul><li>Null hypothesis:$ H0: \mu = \mu0 $(e.g.,$ \mu_0 = 100 $).</li><li>Alternative: depending on the research question (often two-sided, unless specified otherwise).</li><li>Significance level:$ \alpha = 0.05 $(example).</li><li>Critical region for a two-sided test: the central limit theory for the standard normal implies $ \alpha/2 = 0.025 $in each tail, so the critical z-values are approximately$ \pm z_{\alpha/2} = \pm 1.96 $.</li><li>Connection to the 68-95-99.7 rule: about 95% of z-values lie within$ \pm 2 $standard deviations of the mean under the normal distribution, which explains the 95% region in simple z-interval reasoning.</li><li>Practical note: software like JASP reports the p-value for the z (or t) statistic rather than raw critical values; you use the p-value to decide about rejecting$ H_0 $at the chosen$ \alpha $.</li></ul></li><li>Key limitation of the z-test (unknown sigma in practice):<ul><li>The z-test assumes that the population standard deviation$ \sigma $is known, which is very unrealistic in most real-world settings.</li><li>Because$ \sigma $is rarely known, the distribution used is not strictly standard normal for the test statistic based on the sample; this leads to the t-distribution rather than the standard normal.</li></ul></li><li>The t-distribution: motivation and properties<ul><li>When$ \sigma $is unknown and we estimate it with the sample standard deviation$ s $, the test statistic becomes</li><li>One-sample t-statistic:$ t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} $</li><li>Degrees of freedom:$ df = n - 1 $</li><li>The t-distribution is symmetric but heavier-tailed than the standard normal; its exact shape depends on the degrees of freedom.</li><li>As the sample size grows, the t-distribution approaches the standard normal: for large$ df > 30 $(roughly),$ t\approx z $and$ s $approximates$ \sigma $.</li><li>In practice, you compare the calculated$ t $to a t-distribution with$ df = n - 1 $to obtain a p-value.</li></ul></li><li>What does the p-value tell you here?<ul><li>A small p-value (e.g.,$ p < 0.05 $) indicates that the observed sample mean is unlikely under the null and leads to rejection of$ H_0 $at the chosen level of significance.</li><li>In the example, a z of about 3.33 yields a p-value well below 0.05, supporting rejection of$ H_0 $.</li></ul></li><li>Extending to two means (two-sample t-test, independent samples, equal variances):<ul><li>When comparing two samples, you can use a two-sample t-test if you want to assess whether the two population means differ.</li><li>Assumptions for the two-sample t-test with equal variances (pooled-variance t-test):</li><li>Each population is normally distributed.</li><li>The two population standard deviations are equal (homogeneity of variance):$ \sigma1 = \sigma2 $(we do not know them; we estimate).</li><li>Test statistic for independent samples with equal variances:</li><li>Pooled standard deviation (sp): $ sp = \sqrt{\frac{(n1 - 1)s1^2 + (n2 - 1)s2^2}{n1 + n_2 - 2}} $t-statistic:$ t = \frac{\bar{x}1 - \bar{x}2}{sp \sqrt{\frac{1}{n1} + \frac{1}{n_2}}} $Degrees of freedom:$ df = n1 + n2 - 2 $</li><li>Interpretation: larger absolute value of$ t $leads to a larger difference between the two sample means relative to the pooled variability, and a smaller p-value.</li></ul></li><li>Effect size for t-tests: Pearson's correlation coefficient r<ul><li>A common way to quantify the magnitude of the effect in a t-test is via the correlation coefficient$ r $, which can be derived from the t-statistic and its degrees of freedom:</li><li>Formula:$ r = \frac{t}{\sqrt{t^2 + df}} $</li><li>Sign of$ r $matches the sign of the t-statistic, reflecting the direction of the effect.</li><li>Note: Some software (e.g., JASP) reports the t-statistic and df, and you can compute$ r $by hand if desired.</li></ul></li><li>Practical reporting and software considerations<ul><li>Software like JASP reports the p-value for the test statistic (z or t) and the degrees of freedom; it may not always output the effect size (e.g., r) directly, so you may compute it yourself from t and df.</li><li>Interpretation hinges on both statistical significance and practical significance: a p-value can be small with a very large sample even for tiny effects.</li></ul></li><li>The relationship between p-values and sample size (p-hacking warning)<ul><li>A key property: the p-value is sensitive to sample size. Increasing sample size can shrink the p-value even if the effect size remains tiny.</li><li>This can lead to “p-hacking” or fishing for significance by simply accumulating more observations.</li><li>Caution: while larger samples increase power to detect real effects, they can also produce statistically significant results that are practically meaningless if the effect size is trivial.</li></ul></li><li>Takeaway notes<ul><li>Use the z-test when the population standard deviation$ \sigma $is known; otherwise, use the t-test with$ s $as an estimate of variability.</li><li>For one-sample tests, use$ t = \frac{\bar{x} - \mu0}{s / \sqrt{n}} \; (df = n - 1) $or$ z = \frac{\bar{x} - \mu0}{\sigma / \sqrt{n}} \; (df = \text{not applicable if } \sigma \text{ is known}) $depending on data conditions.</li><li>For two independent samples with equal variances, use the pooled-variance t-test; otherwise, separate-variance (Welch) t-test may be used (not detailed here but commonly needed when variances differ).</li><li>Always report effect size (e.g.,$ r$$) in addition to p-values to convey practical significance.
- Be mindful of sample size: large samples can yield statistically significant results for negligible effects without substantive importance.