Concise Notes on Hypothesis Testing

Introduction to Hypothesis Testing

Hypothesis testing is fundamental for statistical analyses and interpreting clinical studies.
Essential for study design and review processes.
Framework for comparing effects or treatments in a structured manner.

Key Statistical Concepts

Parameter: Descriptive measure from a population (e.g., population mean, median, standard deviation).
Statistic: Descriptive measure from a sample (e.g., sample mean, median, standard deviation).
Standard Error: Standard deviation of the sample mean.
Statistical Inference: Making inferences about a population based on sample statistics.

Statistical Estimation

Point Estimation: Determining a specific value for a population parameter.
Example: Baseline mean of 7.26 lesions per month in the beta-interferon study.
Interval Estimation: Quantifying uncertainty with an interval (e.g., 95% Confidence Interval).
Example: 95% CI of (3.83, 10.67) for baseline mean number of lesions in the beta-interferon study.
CIs provide an idea of the variability of the treatment effect.

Basic Concepts in Hypothesis Testing

Null Hypothesis (H0): Typically a statement of no effect or equality between groups. The negation of the research question.
- Example: $H0: \mu1 = \mu2$
Alternative Hypothesis (H1 or HA): States that the null hypothesis is not true.
- Two-Sided Test: $HA: \mu1 \neq \mu2$ (detects any difference).
- One-Sided Test: $HA: \mu1 > \mu2$ (detects difference in one direction only).
Test Statistic: A value calculated from sample data to compare with a known distribution under the null hypothesis.
- General form: $Test statistic = \frac{point estimate of \mu - target value of \mu}{known value or point estimate of s}$

Errors in Hypothesis Testing

Type I Error: Rejecting the null hypothesis when it is true.
- Probability denoted by $\\alpha$ (significance level).
Type II Error: Failing to reject the null hypothesis when the alternative hypothesis is true.
- $\beta$ = P (Type II error).
Power: Probability of rejecting the null hypothesis when the alternative hypothesis is true.
- $Power = 1 - \beta = 1 - P(type II error)$
P-value: The probability of observing a test statistic as extreme or more extreme than observed if the null hypothesis is true.
- If p-value < $\\alpha$ , reject the null hypothesis.

One-Sample Hypothesis Tests

Used when comparing a statistic from one group to a known value.

Tests for Normal Continuous Data

Null and Alternative Hypotheses: $H0: \mux = \mu0$ vs. $HA: \mux \neq \mu0$
Z-test: Used when $\sigma_x$ is known.
- Test statistic: $Z = \frac{(\bar{x} - \mu0)}{(\sigmax / \sqrt{n}) }$
T-test: Used when $\sigma_x$ is unknown.
- Test statistic: $T = \frac{\bar{x} - \mu0}{sx / \sqrt{n}}$ , where $sx = \sqrt{\frac{1}{n-1} \sum{i=1}^{n} (x_i - \bar{x})^2}$

Determining Statistical Significance

Critical Values: Cut points used to determine statistical significance.
- Compare the observed test statistic to the critical values.

Confidence Intervals

For general \\&alpha a 100 * (1 - \\&alpha)% CI for a population parameter is formed around the point estimate of interest
- If variance is known: $[\bar{x} - z{1-\alpha/2} \frac{s}{\sqrt{n}}, \bar{x} + z{1-\alpha/2} \frac{s}{\sqrt{n}}]$

Binary Data

Data with two possible outcomes (success/failure).
Test Statistic: $Z = \frac{\hat{p1} - p0}{\sqrt{\frac{p0(1-p0)}{n}}}$

Exact Tests

Useful for smaller sample sizes, when CLT is suspect

Confidence Intervals

Clopper-Pearson is a classical approach to get better binomial CIs

Two-Sample Hypothesis Tests

Tests for Comparing the Means of Two Normal Populations

Paired Data

Suitable for data like the beta-interferon/MRI trial (measurements before and after treatment).
Test statistic: $T = \frac{\bar{d}}{s/ \sqrt{n}}$

Unpaired Data

$H0 : \mu1 = \mu2 \text{ vs. } HA : \mu1 \neq \mu2$
When $\sigma$ is known, $Z = \frac{\bar{x} - \bar{y}}{s \sqrt{\frac{1}{n} + \frac{1}{m}}}$ has the standard normal distribution.

Otherwise, it is estimated from data as follows: $T = \frac{\bar{x} - \bar{y}}{s \sqrt{\frac{1}{n} + \frac{1}{m}}}$
which has Student’s t distribution with n+m-2 df
It is possible that equal variance in the two groups is not a good assumption. One can perform a Welch's test.

Tests for Comparing Two Population Proportions

The data should be binary
Test statistic $Z = \frac{\hat{p1} - \hat{p2}}{\sqrt{\hat{p1}(1-\hat{p1})/ n + \hat{p2}(1-\hat{p2})/ m}}$
This has approximately the standard normal distribution.

Common Mistakes in Hypothesis Testing

Ignoring pairing or dependence between observations.
Assuming equal variances without verification.
t-test on highly skewed data (parametric test vs non-parametric test)

Misstatements and Misconceptions

Failing to reject the null hypothesis means that it is true.
small p-value means that that the two sample means (x and y) are significantly different from each other
Both a statistically significant finding and a clinically significant finding is needed to interprete the data.

Comparing More Than Two Groups: One-Way Analysis of Variance

An ANOVA framework can be done with multiple means from multiple populations if interested in detecting any differences among the various treatments in those groups.

Simple and Multiple Linear Regression

Hypothesis: H_0 : b1 = 0 vs. HA : b1\neq 0:$$

Multiple Comparisons: When doing multiple comparisons/hypothesis tests

Solution: Choose a lower significane level to prevent false postivie conclusions or to “control the false discovery rate.”