PSYCH307 – Lecture 3

Parametric vs. Non-parametric Statistics

Parametric statistics operate under the assumption that the underlying population follows a normal distribution fully described by parameters such as the mean and standard deviation. By contrast, non-parametric statistics make no strict distributional assumptions and are therefore well suited to irregular or small-sample data such as electroencephalography or skin-conductance readings. The lecture emphasises that all observable phenomena change over time, turning every aspect of the world into a variable. To adapt optimally, researchers need to discover the sources of variation that genuinely influence a focal outcome—for instance, factors that lengthen the human life span or interventions that reduce accident rates. Parametric analyses are powerful because, once their assumptions are satisfied, the results can be generalised to the entire population, allowing evidence-based manipulation of causal factors to attain desired goals such as greater safety or lower stress.

Problems in Determining Causal Relationships

Two broad forms of variability complicate causal inference. Random variability represents chance fluctuations between individual scores and the sample mean; its magnitude grows when samples are small. Systematic variability—or sampling error—arises when sampling procedures introduce bias, for example when body-mass-index data are gathered only at rugby games and therefore over-represent rugby players and fans. Systematic error is best reduced by selecting a representative sample, whereas random error can only be attenuated, not eliminated, by increasing sample size.

Inferential Statistics: Purpose and Basic Logic

Because every measurement is tainted by random variability, researchers rely on inferential statistics to decide whether an observed difference or relationship is real or merely an artefact of chance. The classical procedure begins with the null hypothesis, is true. If that probability is sufficiently low, we doubt the null and provisionally accept that an effect is present.

Reasoning Behind Inferential Statistics

Consider vaccinating 100 individuals and leaving 100 unvaccinated, then finding that flu incidence is 10 % lower in the vaccinated group. Assuming, under $H_0$ , that both populations have identical flu rates, we compute the probability of observing a 10 % difference just by chance. If that probability equals roughly one in twenty (5 %) or less, convention dictates that we treat the result as statistically significant and infer a genuine vaccine effect. Otherwise we attribute the finding to random sampling fluctuations.

Probability and p-Value

Probability is defined as the ratio of favourable outcomes to all possible, equally likely outcomes. The chance of heads in a coin toss is therefore $p = \frac{1}{2} = 0.5$ , while the probability of rolling a specific face of a die is $p = \frac{1}{6} \approx 0.17$ . In research, the p-value represents the probability of obtaining our observed statistic, or one more extreme, under the null hypothesis. A p-value below the chosen significance threshold (usually $\alpha = .05$ ) indicates that the finding is sufficiently improbable under $H_0$ to warrant rejecting the null.

Hypothesis Testing Fundamentals

Borrowing from psychophysics’ Signal Detection Theory, every statistical decision involves uncertain evidence and four potential outcomes: a hit, a miss, a false alarm (Type I error) or a correct rejection. A Type I error occurs when we reject a true null hypothesis, whereas a Type II error arises when we fail to reject a false null. Because science prizes caution against proclaiming effects that do not exist, researchers focus on limiting the Type I error rate, traditionally to 5 %. The p-value itself is an estimate of this very probability.

Type I and Type II Errors

The researcher predetermines the tolerable Type I error probability by setting $\alpha$ . Lowering $\alpha$ from $0.05$ to $0.01$ makes the test more conservative, decreasing the odds of falsely rejecting $H_0$ but simultaneously increasing the likelihood of a Type II error—failing to detect a real effect. Thus, the two errors trade off: when one probability drops, the other rises.

The t-Test Family

When the research question concerns the difference between two means, the t-test offers a principled solution. The independent-samples t-test compares means derived from two separate groups and is valid provided each sample represents its population and measurement is at least interval-scaled. Its core statistic is

where the numerator is the observed mean difference and the denominator, often labelled “variability,” is the standard error of that difference. By quantifying how many standard errors the observed gap lies from zero (the null expectation), $t$ tells us the distance of our result from the region in which chance alone would plausibly operate.

Sampling Distributions and the t-Distribution

If we repeatedly drew random samples and computed each sample’s mean, those means would themselves form a normal curve called the sampling distribution. When sample sizes are small to moderate, the sampling distribution of mean differences conforms to the t-distribution—symmetrical like the z-distribution but with heavier tails. Assuming identical populations under $H_0$ , most mean differences cluster near zero, with only about 5 % falling beyond the critical boundaries that mark the region of rejection.

Decision Rules and Critical Regions

For a two-tailed test at $\alpha = .05$ , the critical t- (or z-) values cut off the outer 2.5 % in each tail. In z-units the boundaries are $z = \pm 1.96$ ; for t-tests the cut-points depend on degrees of freedom. If the obtained statistic exceeds the positive critical value or lies below the negative counterpart, the null is rejected. Otherwise it survives, though strictly speaking we never “accept” $H_0$ ; we merely note insufficient evidence against it.

Reporting t-Test Results

Transparent reporting requires the means, standard deviations, the mean difference, degrees of freedom, the obtained t-value and the exact p-value: for example, $t(9) = 1.26,\, p = 0.24$ , which one might summarise as “not significant at the 0.05 level.” By convention the pre-specified significance level is mentioned in the methods or analysis section.

Dependent-Samples (Paired) t-Test

When the same individuals provide two measurements—such as hunger ratings before and after viewing fast-food commercials—the paired t-test evaluates whether the mean change differs from zero. This design controls for between-subject variability, often boosting statistical power. The assumptions remain that the dependent variable is interval-level (or better) and approximately normally distributed.

Statistical Power

Power, the probability of correctly rejecting a false null hypothesis, increases when the standard deviation is small, the true effect size is large, or the sample size is ample. Each factor narrows the sampling distribution or shifts group means farther apart, making real differences easier to detect. Prior evidence that a specific group differs in a known direction allows the researcher to specify a directional hypothesis, thereby using a one-tailed test and increasing power further.

One-Tailed versus Two-Tailed Tests

A one-tailed test confines the rejection region to one tail of the distribution. At $\alpha = .05$ the critical z-value for a one-tailed test is $z = 1.65$ instead of $1.96$ , reflecting a more lenient boundary when the effect is predicted in advance. However, using a one-tailed test when the effect could logically appear in either direction is inappropriate and risks inflating Type I errors. Lowering $\alpha$ to $0.01$ makes the test more conservative, shifting the one-tailed critical value to $z = 2.33$ and the two-tailed to $z = 2.58$ .

Beyond t-Tests: Analysis of Variance (ANOVA)

When more than two groups or levels are involved, running multiple t-tests would inflate the Type I error rate. Instead, researchers switch to Analysis of Variance (ANOVA), which will be covered in the subsequent lecture by Dr Andrew Evelo. ANOVA partitions total variability into components attributable to systematic group differences and unsystematic error, providing a single omnibus $F$ statistic that guards the overall Type I error at the chosen $\alpha$ level.

Key Quantitative Expressions and Definitions

The probability of an outcome, under equal likelihood, is $p = \frac{\text{favourable cases}}{\text{all cases}}$ . The t-statistic for independent samples is

Practical, Ethical and Real-World Considerations

Reducing Type I errors is vital when false positives carry serious consequences, such as approving a medical treatment that is ineffective or harmful; such contexts warrant an $\alpha$ of $0.01$ or even $0.001$ . Conversely, when failing to detect a genuine effect would be costly—say, overlooking a dangerous side-effect—researchers might tolerate a higher Type I risk to safeguard against Type II errors. The lecture situates these statistical choices within the broader mission of improving human well-being, whether by increasing safety, mitigating stress or extending life expectancy.

Connections to Previous and Future Material

The concepts introduced connect back to fundamental probability, descriptive statistics and measurement theory, all prerequisites for grasping inferential testing. They also lay the groundwork for forthcoming topics such as ANOVA, regression and multivariate techniques, each of which extends the same core logic—quantifying signal relative to noise—to more complex designs and larger parameter spaces.

Concluding Remarks

Statistical inference is a structured way to convert uncertain sample information into cautious statements about populations. By mastering assumptions, error types, and decision rules, researchers can draw conclusions that are both defensible and practically valuable, always mindful that statistics offer likelihood, never certainty.