Untitled Flashcard Set

Problem 1 — Sign Test / Median Hypothesis Testing

Dataset: 1, 2, 2, 2, 2, 3, 3, 4, 4, 5, 7 (n = 11)

Part (a) — Why analyze the median instead of the mean?

What to know: The mean is sensitive to outliers and skewness. The median is a better measure of center when data is skewed or has extreme values. Look at the dataset — the value 7 is a potential outlier that would pull the mean upward. The distribution appears right-skewed.

Answer strategy: Point out the outlier (7) or the skewness — the data is bunched on the low end with a long right tail.

Part (b) — Sign Test: Binomial Probability

What to know: This is the Sign Test. The idea is: if the true median is 4.5, then each data point has a 50% chance of being above or below it.

Steps:

Count how many values are below 4.5 → values: 1,2,2,2,2,3,3,4,4 = 9 values below
Count how many are above 4.5 → values: 5, 7 = 2 values above
No values equal 4.5 exactly, so n = 11
Under H₀ (median = 4.5), each observation is below with p = 0.5
The "extreme" result is getting 2 or fewer above (or 9 or more below)
Calculate: P(X ≤ 2) where X ~ Binomial(n=11, p=0.5)

Formula:

P(X=k)=(nk)(0.5)nP(X = k) = \binom{n}{k}(0.5)^nP(X=k)=(kn)(0.5)n

Calculate P(X=0) + P(X=1) + P(X=2), then double it (two-tailed test):

P(X=0) = C(11,0)(0.5)^11 = 1/2048
P(X=1) = C(11,1)(0.5)^11 = 11/2048
P(X=2) = C(11,2)(0.5)^11 = 55/2048
One-tailed p = 67/2048 ≈ 0.0327
Two-tailed p ≈ 0.0654

Part (c) — Is there evidence against median = 4.5 at α = 5%?

What to know: Compare your p-value to α = 0.05.

Answer: p ≈ 0.065 > 0.05, so you fail to reject H₀. Not sufficient evidence at the 5% level.

Part (d) — Why can't we do this test for the mean?

What to know: The sign test works for the median because, by definition, exactly 50% of values fall above/below the true median — giving us a known probability (p = 0.5) for the binomial. For the mean, there's no such fixed probability for a data point to fall above or below the hypothesized mean without knowing the full distribution shape.

Part (e) — Why is the Wilcoxon Signed Rank p-value "better"? Will it always be smaller?

What to know: The Wilcoxon Signed Rank test uses both the sign and the magnitude (rank) of deviations from the hypothesized median, giving it more statistical power than the pure sign test. By incorporating how far each observation is from 4.5, it uses more information → more precise p-value.

Will it always be smaller? No — not always. It depends on the data. But it tends to be smaller (more powerful) when the distribution is symmetric.

Problem 2 — Steps for the Wilcoxon Signed Rank Test Statistic (T)

What to know: Memorize these ordered steps (assume D₀ = 0, meaning the hypothesized median/mean difference is 0):

Calculate differences: Dᵢ = Xᵢ − D₀
Discard any Dᵢ = 0
Take the absolute values |Dᵢ|
Rank the absolute values from smallest (rank 1) to largest
Handle ties by assigning the average rank
Attach the original sign (+/−) to each rank
Calculate T⁺ = sum of positive ranks, T⁻ = sum of negative ranks
T = min(T⁺, T⁻) (or use T⁺ alone depending on the one/two-tailed context)
Compare T to a critical value table (or compute a z-approximation for large n)

Problem 3 — Steps for the Wilcoxon Rank Sum Test Statistic (T)

What to know: This test compares two independent groups. Steps:

Combine all observations from both groups into one list
Rank all values from smallest (1) to largest
Handle ties with average ranks
Sum the ranks for the smaller group → this is T (or W)
Compare T to a critical value (or use a normal approximation for large samples)

Key difference from Signed Rank: No signs, no differences — you're ranking all data together.

Problem 4 — Steps for the Kruskal-Wallis Test Statistic (H')

What to know: This is the non-parametric version of one-way ANOVA (for 3+ groups). Steps:

Combine all observations from all k groups into one dataset
Rank all N values from 1 to N (average ranks for ties)
For each group i, compute the sum of ranks Rᵢ
Compute H:

H=12N(N+1)∑i=1kRi2ni−3(N+1)H = \frac{12}{N(N+1)} \sum_{i=1}^{k} \frac{R_i^2}{n_i} - 3(N+1)H=N(N+1)12i=1∑kniRi2−3(N+1)

Apply tie correction if needed: divide H by a correction factor
Compare H' to a chi-squared distribution with df = k − 1

Problem 5 & 6 — When to Use Wilcoxon Signed Rank Test

(Note: Problems 5 and 6 appear to be the same question — likely one is meant to be about the Rank Sum test.)

When to use Wilcoxon Signed Rank:

Experimental situation: One sample tested against a hypothesized median, OR paired data (before/after, matched pairs) where you look at the difference within each pair
Conditions required:
- Data is continuous (or at least ordinal)
- The distribution of differences is approximately symmetric (not necessarily normal)
- Observations are independent
- Cannot use a t-test because normality is violated or sample size is too small

Problem 7 — When to Use Kruskal-Wallis Test

When to use:

Experimental situation: You have 3 or more independent groups and want to test whether they come from the same distribution (analogous to one-way ANOVA)
Conditions required:
- Data is at least ordinal
- Groups are independent
- Each group should have similar distributional shape (so differences reflect location, not shape)
- Normality is not required — use this when ANOVA's normality assumption is violated

Problem 8 — Why Does Large s²_B Relative to s²_W Imply Different Group Means?

What to know:

s²_B (between-group variance): Measures how much group means vary around the overall grand mean
s²_W (within-group variance): Measures natural variability of individuals within the same group

Key logic: If all group means are equal, the variation between groups should be no larger than what's expected from random sampling (i.e., ≈ s²_W). When s²_B is much larger than s²_W, it means the spread of the group means is too large to be explained by chance alone — implying at least one group mean is truly different.

Problem 9 — Reading an ANOVA Table

What you need to know cold:

Source	SS	df	MS = SS/df	F = MSB/MSW
Between (B)	SSB	k−1	s²_B	F
Within (W/Error)	SSW	N−k	s²_W (= MSE)
Total	SST	N−1

SST = SSB + SSW
s²_B = SSB / (k−1), s²_W = SSW / (N−k)
MSE = s²_W (same thing)
Pooled variance s²_P = SSW / (N−k) (also = MSE for balanced designs)
F = s²_B / s²_W
With 5 groups: df_between = 4, read N from total df + 1

Practice: Fill in any two of SSB/SSW/SST and you can find the third.

Problem 10 — Post-Hoc Methods: Most to Least Conservative

Ranking (most to least conservative):

Scheffé > Bonferroni > Tukey > Fisher's LSD

What "conservative" means: A conservative test makes it harder to reject H₀, reducing Type I error (false positives) but increasing Type II error (missing real differences).

Problem with over-conservatism: You lose power — you're more likely to miss true differences between group means (increased Type II error / false negatives).

Problem 11 — Scheffé's Advantages and Disadvantage

Two advantages:

Valid for any type of comparison — not just pairwise, but complex contrasts (e.g., mean of groups 1+2 vs. group 3)
Strictly controls the family-wise error rate (FWER) across all possible contrasts

Major disadvantage:

It is the most conservative of all post-hoc methods, meaning it has the lowest power — it's hardest to detect real differences

Problem 12 — Interpreting Confidence Intervals

CI = (2, 4):

F-based CI for ratio of two population variances (σ₁²/σ₂²): We are 95% confident the true variance ratio is between 2 and 4. Since the interval doesn't include 1, the variances are likely unequal.
t-based CI for difference between two population means (μ₁ − μ₂): We are 95% confident the true difference in means is between 2 and 4. Since the interval doesn't include 0, the means are likely significantly different.

Problem 13 — What Are Post-Hoc Methods and Why Are They Needed?

What to know:

They are called "post-hoc" (Latin: "after this") because they are applied after the overall F-test has already rejected H₀
They are needed because the F-test only tells you that at least one group differs — it doesn't tell you which groups differ from which
Running multiple individual t-tests would inflate the Type I error rate (the multiple comparisons problem), so post-hoc methods adjust for this

Problem 14 — The One-Way ANOVA Model

Model: yᵢⱼ = μ + τᵢ + εᵢⱼ

Term	Meaning
yᵢⱼ	The j-th observation in the i-th group
μ	Overall (grand) population mean
τᵢ	Treatment effect for group i (deviation of group i's mean from μ)
εᵢⱼ	Random error for observation j in group i

Why must a constraint be imposed (e.g., Στᵢ = 0)? The parameters μ and τᵢ are not uniquely identifiable without one — you could add a constant to μ and subtract it from all τᵢ and get the same fit. The constraint removes this redundancy.

Three assumptions of ANOVA:

Normality — εᵢⱼ are normally distributed
Homoscedasticity — equal variances across all groups (σ²₁ = σ²₂ = ... = σ²_k)
Independence — observations are independent of each other

Problem 15 — Choosing the Right Post-Hoc Test

Scenario	Use
(a) Only pairwise comparisons, don't want to be too aggressive on Type I error	Tukey's HSD — designed exactly for all pairwise comparisons, well-balanced
(b) Few comparisons, some complex (non-pairwise)	Bonferroni — best when you have a small, pre-specified set of comparisons
(c) Many comparisons, some complex	Scheffé — handles any contrast type, controls FWER across all possible contrasts
(d) Willing to accept elevated Type I error to increase power	Fisher's LSD — least conservative, highest power, but inflates error rate
(e) No plan, peeking at data first	This is data dredging / p-hacking — no legitimate post-hoc method justifies this. Scheffé is sometimes cited as the only one robust to this, but the practice itself is statistically invalid.