Z-test and t-test Notes

Z-test and t-test: Core Ideas

  • Overview: In practice, we look at the p-value for the test statistic when applying z-tests or t-tests in software like JASP. Several effect-size measures can be reported, e.g. Pearson's correlation coefficient, Cohen's d, omega, or omega-squared.

  • One simple example setup (z-test):

    • Population of interest: all clinical IP scores with population mean μ\mu (unknown) and population standard deviation known as part of the example: σ=15\sigma = 15.

    • Small sample: n = 25 students from a class, sample mean xˉ=110\bar{x} = 110.

    • Null hypothesis: H0:μ=100H_0: \mu = 100 (population mean of IP scores assumed to be 100 under the null).

    • Goal: test whether the sample provides evidence that the population mean differs from 100.

    • Rationale: The z-test compares the sample mean to the population mean using knowledge of the population standard deviation.

    • Example summary: With the numbers above, the z-statistic would be computed to evaluate whether the observed sample mean could arise if μ=100\mu = 100.

  • How to compute the z-statistic (for the sample mean):

    • For a single observation x: the z-score is z=xμσz = \frac{x - \mu}{\sigma}

    • For a sample mean, assuming known population standard deviation, the distribution of the sample mean is centered at μ\mu with standard deviation σXˉ=σn\sigma_{\bar{X}} = \frac{\sigma}{\sqrt{n}}, so

    • Z-statistic for the sample mean: z=xˉμ0σ/nz = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}}

    • Under the null, zz follows the standard normal distribution N(0,1)N(0,1).

    • In the example: with xˉ=110\bar{x} = 110 , μ<em>0=100\mu<em>0 = 100, σ=15\sigma = 15, and n=25n = 25, z=11010015/25=1033.33z = \frac{110 - 100}{15 / \sqrt{25}} = \frac{10}{3} \approx 3.33 which yields a very small p-value (two-sided, about p0.0009$).Ifthechosenalphais0.05,wewouldrejectp \approx 0.0009\$). If the chosen alpha is 0.05, we would rejectH0.</p></li></ul></li><li><p>Hypotheses,alpha,anddecisionregions(twotailedexample):</p><ul><li><p>Nullhypothesis:.</p></li></ul></li><li><p>Hypotheses, alpha, and decision regions (two-tailed example):</p><ul><li><p>Null hypothesis:H0: \mu = \mu0(e.g.,(e.g.,\mu_0 = 100).</p></li><li><p>Alternative:dependingontheresearchquestion(oftentwosided,unlessspecifiedotherwise).</p></li><li><p>Significancelevel:).</p></li><li><p>Alternative: depending on the research question (often two-sided, unless specified otherwise).</p></li><li><p>Significance level:\alpha = 0.05(example).</p></li><li><p>Criticalregionforatwosidedtest:thecentrallimittheoryforthestandardnormalimplies<br>(example).</p></li><li><p>Critical region for a two-sided test: the central limit theory for the standard normal implies<br>\alpha/2 = 0.025ineachtail,sothecriticalzvaluesareapproximatelyin each tail, so the critical z-values are approximately\pm z_{\alpha/2} = \pm 1.96.</p></li><li><p>Connectiontothe689599.7rule:about95.</p></li><li><p>Connection to the 68-95-99.7 rule: about 95% of z-values lie within\pm 2standarddeviationsofthemeanunderthenormaldistribution,whichexplainsthe95standard deviations of the mean under the normal distribution, which explains the 95% region in simple z-interval reasoning.</p></li><li><p>Practical note: software like JASP reports the p-value for the z (or t) statistic rather than raw critical values; you use the p-value to decide about rejectingH_0atthechosenat the chosen\alpha.</p></li></ul></li><li><p>Keylimitationoftheztest(unknownsigmainpractice):</p><ul><li><p>Theztestassumesthatthepopulationstandarddeviation.</p></li></ul></li><li><p>Key limitation of the z-test (unknown sigma in practice):</p><ul><li><p>The z-test assumes that the population standard deviation\sigmaisknown,whichisveryunrealisticinmostrealworldsettings.</p></li><li><p>Becauseis known, which is very unrealistic in most real-world settings.</p></li><li><p>Because\sigmaisrarelyknown,thedistributionusedisnotstrictlystandardnormalfortheteststatisticbasedonthesample;thisleadstothetdistributionratherthanthestandardnormal.</p></li></ul></li><li><p>Thetdistribution:motivationandproperties</p><ul><li><p>Whenis rarely known, the distribution used is not strictly standard normal for the test statistic based on the sample; this leads to the t-distribution rather than the standard normal.</p></li></ul></li><li><p>The t-distribution: motivation and properties</p><ul><li><p>When\sigmaisunknownandweestimateitwiththesamplestandarddeviationis unknown and we estimate it with the sample standard deviations,theteststatisticbecomes</p></li><li><p>Onesampletstatistic:, the test statistic becomes</p></li><li><p>One-sample t-statistic: t = \frac{\bar{x} - \mu_0}{s / \sqrt{n}} </p></li><li><p>Degreesoffreedom:</p></li><li><p>Degrees of freedom: df = n - 1 </p></li><li><p>Thetdistributionissymmetricbutheaviertailedthanthestandardnormal;itsexactshapedependsonthedegreesoffreedom.</p></li><li><p>Asthesamplesizegrows,thetdistributionapproachesthestandardnormal:forlarge</p></li><li><p>The t-distribution is symmetric but heavier-tailed than the standard normal; its exact shape depends on the degrees of freedom.</p></li><li><p>As the sample size grows, the t-distribution approaches the standard normal: for largedf > 30(roughly),(roughly),t\approx zandandsapproximatesapproximates\sigma.</p></li><li><p>Inpractice,youcomparethecalculated.</p></li><li><p>In practice, you compare the calculatedttoatdistributionwithto a t-distribution withdf = n - 1toobtainapvalue.</p></li></ul></li><li><p>Whatdoesthepvaluetellyouhere?</p><ul><li><p>Asmallpvalue(e.g.,to obtain a p-value.</p></li></ul></li><li><p>What does the p-value tell you here?</p><ul><li><p>A small p-value (e.g.,p < 0.05)indicatesthattheobservedsamplemeanisunlikelyunderthenullandleadstorejectionof) indicates that the observed sample mean is unlikely under the null and leads to rejection ofH_0atthechosenlevelofsignificance.</p></li><li><p>Intheexample,azofabout3.33yieldsapvaluewellbelow0.05,supportingrejectionofat the chosen level of significance.</p></li><li><p>In the example, a z of about 3.33 yields a p-value well below 0.05, supporting rejection ofH_0.</p></li></ul></li><li><p>Extendingtotwomeans(twosamplettest,independentsamples,equalvariances):</p><ul><li><p>Whencomparingtwosamples,youcanuseatwosamplettestifyouwanttoassesswhetherthetwopopulationmeansdiffer.</p></li><li><p>Assumptionsforthetwosamplettestwithequalvariances(pooledvariancettest):</p></li><li><p>Eachpopulationisnormallydistributed.</p></li><li><p>Thetwopopulationstandarddeviationsareequal(homogeneityofvariance):.</p></li></ul></li><li><p>Extending to two means (two-sample t-test, independent samples, equal variances):</p><ul><li><p>When comparing two samples, you can use a two-sample t-test if you want to assess whether the two population means differ.</p></li><li><p>Assumptions for the two-sample t-test with equal variances (pooled-variance t-test):</p></li><li><p>Each population is normally distributed.</p></li><li><p>The two population standard deviations are equal (homogeneity of variance):\sigma1 = \sigma2(wedonotknowthem;weestimate).</p></li><li><p>Teststatisticforindependentsampleswithequalvariances:</p></li><li><p>Pooledstandarddeviation(sp):<br>(we do not know them; we estimate).</p></li><li><p>Test statistic for independent samples with equal variances:</p></li><li><p>Pooled standard deviation (sp):<br> sp = \sqrt{\frac{(n1 - 1)s1^2 + (n2 - 1)s2^2}{n1 + n_2 - 2}} tstatistic:t-statistic: t = \frac{\bar{x}1 - \bar{x}2}{sp \sqrt{\frac{1}{n1} + \frac{1}{n_2}}} Degreesoffreedom:Degrees of freedom: df = n1 + n2 - 2 </p></li><li><p>Interpretation:largerabsolutevalueof</p></li><li><p>Interpretation: larger absolute value oftleadstoalargerdifferencebetweenthetwosamplemeansrelativetothepooledvariability,andasmallerpvalue.</p></li></ul></li><li><p>Effectsizeforttests:Pearsonscorrelationcoefficientr</p><ul><li><p>Acommonwaytoquantifythemagnitudeoftheeffectinattestisviathecorrelationcoefficientleads to a larger difference between the two sample means relative to the pooled variability, and a smaller p-value.</p></li></ul></li><li><p>Effect size for t-tests: Pearson's correlation coefficient r</p><ul><li><p>A common way to quantify the magnitude of the effect in a t-test is via the correlation coefficientr,whichcanbederivedfromthetstatisticanditsdegreesoffreedom:</p></li><li><p>Formula:, which can be derived from the t-statistic and its degrees of freedom:</p></li><li><p>Formula: r = \frac{t}{\sqrt{t^2 + df}} </p></li><li><p>Signof</p></li><li><p>Sign ofrmatchesthesignofthetstatistic,reflectingthedirectionoftheeffect.</p></li><li><p>Note:Somesoftware(e.g.,JASP)reportsthetstatisticanddf,andyoucancomputematches the sign of the t-statistic, reflecting the direction of the effect.</p></li><li><p>Note: Some software (e.g., JASP) reports the t-statistic and df, and you can computerbyhandifdesired.</p></li></ul></li><li><p>Practicalreportingandsoftwareconsiderations</p><ul><li><p>SoftwarelikeJASPreportsthepvaluefortheteststatistic(zort)andthedegreesoffreedom;itmaynotalwaysoutputtheeffectsize(e.g.,r)directly,soyoumaycomputeityourselffromtanddf.</p></li><li><p>Interpretationhingesonbothstatisticalsignificanceandpracticalsignificance:apvaluecanbesmallwithaverylargesampleevenfortinyeffects.</p></li></ul></li><li><p>Therelationshipbetweenpvaluesandsamplesize(phackingwarning)</p><ul><li><p>Akeyproperty:thepvalueissensitivetosamplesize.Increasingsamplesizecanshrinkthepvalueeveniftheeffectsizeremainstiny.</p></li><li><p>Thiscanleadtophackingorfishingforsignificancebysimplyaccumulatingmoreobservations.</p></li><li><p>Caution:whilelargersamplesincreasepowertodetectrealeffects,theycanalsoproducestatisticallysignificantresultsthatarepracticallymeaninglessiftheeffectsizeistrivial.</p></li></ul></li><li><p>Takeawaynotes</p><ul><li><p>Usetheztestwhenthepopulationstandarddeviationby hand if desired.</p></li></ul></li><li><p>Practical reporting and software considerations</p><ul><li><p>Software like JASP reports the p-value for the test statistic (z or t) and the degrees of freedom; it may not always output the effect size (e.g., r) directly, so you may compute it yourself from t and df.</p></li><li><p>Interpretation hinges on both statistical significance and practical significance: a p-value can be small with a very large sample even for tiny effects.</p></li></ul></li><li><p>The relationship between p-values and sample size (p-hacking warning)</p><ul><li><p>A key property: the p-value is sensitive to sample size. Increasing sample size can shrink the p-value even if the effect size remains tiny.</p></li><li><p>This can lead to “p-hacking” or fishing for significance by simply accumulating more observations.</p></li><li><p>Caution: while larger samples increase power to detect real effects, they can also produce statistically significant results that are practically meaningless if the effect size is trivial.</p></li></ul></li><li><p>Takeaway notes</p><ul><li><p>Use the z-test when the population standard deviation\sigmaisknown;otherwise,usethettestwithis known; otherwise, use the t-test withsasanestimateofvariability.</p></li><li><p>Foronesampletests,useas an estimate of variability.</p></li><li><p>For one-sample tests, use t = \frac{\bar{x} - \mu0}{s / \sqrt{n}} \; (df = n - 1) oror z = \frac{\bar{x} - \mu0}{\sigma / \sqrt{n}} \; (df = \text{not applicable if } \sigma \text{ is known}) dependingondataconditions.</p></li><li><p>Fortwoindependentsampleswithequalvariances,usethepooledvariancettest;otherwise,separatevariance(Welch)ttestmaybeused(notdetailedherebutcommonlyneededwhenvariancesdiffer).</p></li><li><p>Alwaysreporteffectsize(e.g.,depending on data conditions.</p></li><li><p>For two independent samples with equal variances, use the pooled-variance t-test; otherwise, separate-variance (Welch) t-test may be used (not detailed here but commonly needed when variances differ).</p></li><li><p>Always report effect size (e.g.,r$$) in addition to p-values to convey practical significance.

    • Be mindful of sample size: large samples can yield statistically significant results for negligible effects without substantive importance.