# Chapter 13: F Distribution and One-way Anova

## Introductory

• Analysis of Variance: For hypothesis tests comparing averages between more than two groups

## 13.1 One-Way ANOVA

• ANOVA Test: determine the existence of a statistically significant difference among several group means.

• Variances: helps determine if the means are equal or not

• ANOVA Conditions

• Each population from which a sample is taken is assumed to be normal.

• All samples are randomly selected and independent.

• The populations are assumed to have equal standard deviations (or variances).

• The factor is a categorical variable.

• The response is a numerical variable.

• Ho: μ1 = μ2 = μ3 = ... = μk

• Ha: At least two of the group means μ1, μ2, μ3, ..., μk are not equal. That is, μi ≠ μj for some i ≠ j.

• The null hypothesis: is simply that all the group population means are the same.

• The alternative hypothesis: is that at least one pair of means is different.

• Ho is true: All means are the same; the differences are due to random variation.

• Ho is NOT true: All means are not the same; the differences are too large to be due to random variation.

## 13.2 The F Distribution and the F-Ratio

• F-distribution: theoretical distribution that compares two populations

• There are two sets of degrees of freedom; one for the numerator and one for the denominator.

• To calculate the F ratio, two estimates of the variance are made.

1. Variance between samples: An estimate of σ2 that is the variance of the sample means multiplied by n (when the sample sizes are the same.).

2. Variance within samples: An estimate of σ2 that is the average of the sample variances (also known as a pooled variance).

1. SSbetween: the sum of squares that represents the variation among the different samples

2. SSwithin: the sum of squares that represents the variation within samples that is due to chance.

• MS means: "mean square."

• MSbetween: is the variance between groups

• MSwithin: is the variance within groups.

### Calculation of Sum of Squares and Mean Square

• k: the number of different groups

• nj: the size of the jth group

• sj: the sum of the values in the jth group

• n: total number of all the values combined (total sample size: ∑nj)

• x: one value→ ∑x = ∑sj

• Sum of squares of all values from every group combined: ∑x2

• Between-group variability: SStotal = ∑x2 – (∑𝑥2) / n

• Total sum of squares: ∑*x^*2 – (∑𝑥)^2n / n

• Explained variation: sum of squares representing variation among the different samples→ SSbetween = ∑[(𝑠𝑗)^2 / 𝑛𝑗]−(∑𝑠𝑗)^2 / 𝑛

• Unexplained variation: sum of squares representing variation within samples due to chance→ 𝑆𝑆within = 𝑆𝑆total – 𝑆𝑆between

• df**'s for different groups (df's for the numerator)**: df = k – 1

• dfwithin = n k*:* Equation for errors within samples (df's for the denominator)

• MSbetween = 𝑆𝑆between / 𝑑𝑓between: Mean square (variance estimate) explained by the different groups

• MSwithin = 𝑆𝑆within / 𝑑𝑓within: Mean square (variance estimate) that is due to chance (unexplained)

• Null hypothesis is true: MSbetween and MSwithin should both estimate the same value.

• The alternate hypothesis: at least two of the sample groups come from populations with different normal distributions.

• The null hypothesis: all groups are samples from populations having the same normal distribution

### F-Ratio or F Statistic

• 𝐹 = 𝑀𝑆between / 𝑀𝑆within

• F**-Ratio Formula when the groups are the same size:** 𝐹 = 𝑛⋅𝑠𝑥^2 / 𝑠^2 pooled

• where ...

• n: the sample size

• dfnumerator: k – 1

• dfdenominator: nk

• s2 pooled: the mean of the sample variances (pooled variance)

• sx¯^2: the variance of the sample means

## 13.2 Facts About the F Distribution

• Here are some facts about the F distribution.

• The curve is not symmetrical but skewed to the right.

• There is a different curve for each set of dfs.

• The F statistic is greater than or equal to zero.

• As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal.

## 13.4 Test of Two Variances

• In order to perform a F test of two variances, it is important that the following are true:

• The populations from which the two samples are drawn are normally distributed.

• The two populations are independent of each other.

• F has the distribution F ~ F(n1 – 1, n2 – 1)

• where n1 – 1 are the degrees of freedom for the numerator and n2 – 1 are the degrees of freedom for the denominator.

• F is close to one: the evidence favors the null hypothesis (the two population variances are equal)

• F is much larger than one: then the evidence is against the null hypothesis

• A test of two variances may be left, right, or two-tailed.