Chapter 13: F Distribution and One-way Anova

**Analysis of Variance**: For hypothesis tests comparing averages between more than two groups

**ANOVA Test**: determine the existence of a statistically significant difference among several group means.**Variances**: helps determine if the means are equal or notANOVA Conditions

Each population from which a sample is taken is assumed to be normal.

All samples are randomly selected and independent.

The populations are assumed to have equal standard deviations (or variances).

The factor is a categorical variable.

The response is a numerical variable.

**Ho**: μ1 = μ2 = μ3 = ... = μk**Ha**: At least two of the group means μ1, μ2, μ3, ..., μk are not equal. That is, μi ≠ μj for some i ≠ j.**The null hypothesis**: is simply that all the group population means are the same.**The alternative hypothesis**: is that at least one pair of means is different.**Ho is true**: All means are the same; the differences are due to random variation.**Ho is NOT true**: All means are not the same; the differences are too large to be due to random variation.

**F-distribution**: theoretical distribution that compares two populationsThere are two sets of degrees of freedom; one for the numerator and one for the denominator.

To calculate the

*F***ratio**, two estimates of the variance are made.**Variance between samples:**An estimate of*σ*2 that is the variance of the sample means multiplied by*n*(when the sample sizes are the same.).**Variance within samples:**An estimate of*σ*2 that is the average of the sample variances (also known as a pooled variance).: the sum of squares that represents the variation among the different samples*SS*between: the sum of squares that represents the variation within samples that is due to chance.*SS*within

*MS***means**: "mean square.": is the variance between groups*MS*between: is the variance within groups.*MS*within

**k**: the number of different groups**nj**: the size of the*jth*group**sj**: the sum of the values in the*jth*group: total number of all the values combined (total sample size: ∑*n**nj*): one value→ ∑*x**x*= ∑*sj***Sum of squares of all values from every group combined**: ∑*x*2**Between-group variability**:*SS*total = ∑*x*2 – (∑𝑥2) / n**Total sum of squares**: ∑*x^*2 – (∑𝑥)^2n / n**Explained variation**: sum of squares representing variation among the different samples→*SS*between = ∑[(𝑠𝑗)^2 / 𝑛𝑗]−(∑𝑠𝑗)^2 / 𝑛**Unexplained variation**: sum of squares representing variation within samples due to chance→ 𝑆𝑆within = 𝑆𝑆total – 𝑆𝑆between**'s for different groups (*df*'s for the numerator)**:*df**df*=*k*– 1*df*within =*n***–***:* Equation for errors within samples (*k**df*'s for the denominator): Mean square (variance estimate) explained by the different groups*MS*between = 𝑆𝑆between / 𝑑𝑓betweenMean square (variance estimate) that is due to chance (unexplained)*MS*within = 𝑆𝑆within / 𝑑𝑓within:**Null hypothesis is true**:*MS*between and*MS*within should both estimate the same value.**The alternate hypothesis**: at least two of the sample groups come from populations with different normal distributions.**The null hypothesis**: all groups are samples from populations having the same normal distribution

𝐹 = 𝑀𝑆between / 𝑀𝑆within

**-Ratio Formula when the groups are the same size:** 𝐹 = 𝑛⋅𝑠𝑥^2 / 𝑠^2 pooled*F*where ...

: the sample size*n*:*df*numerator*k*– 1:*df*denominator*n*–*k*: the mean of the sample variances (pooled variance)*s*2 pooled**sx¯^2**: the variance of the sample means

Here are some facts about the

*F*distribution.The curve is not symmetrical but skewed to the right.

There is a different curve for each set of

*df*s.The

*F*statistic is greater than or equal to zero.As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal.

In order to perform a

*F*test of two variances, it is important that the following are true:The populations from which the two samples are drawn are normally distributed.

The two populations are independent of each other.

*F*has the distribution*F*~*F*(*n*1 – 1,*n*2 – 1)where n1 – 1 are the degrees of freedom for the numerator and n2 – 1 are the degrees of freedom for the denominator.

*F***is close to one**: the evidence favors the null hypothesis (the two population variances are equal)*F***is much larger than one**: then the evidence is against the null hypothesisA test of two variances may be left, right, or two-tailed.

## Examples

**Analysis of Variance**: For hypothesis tests comparing averages between more than two groups

**ANOVA Test**: determine the existence of a statistically significant difference among several group means.**Variances**: helps determine if the means are equal or notANOVA Conditions

Each population from which a sample is taken is assumed to be normal.

All samples are randomly selected and independent.

The populations are assumed to have equal standard deviations (or variances).

The factor is a categorical variable.

The response is a numerical variable.

**Ho**: μ1 = μ2 = μ3 = ... = μk**Ha**: At least two of the group means μ1, μ2, μ3, ..., μk are not equal. That is, μi ≠ μj for some i ≠ j.**The null hypothesis**: is simply that all the group population means are the same.**The alternative hypothesis**: is that at least one pair of means is different.**Ho is true**: All means are the same; the differences are due to random variation.**Ho is NOT true**: All means are not the same; the differences are too large to be due to random variation.

**F-distribution**: theoretical distribution that compares two populationsThere are two sets of degrees of freedom; one for the numerator and one for the denominator.

To calculate the

*F***ratio**, two estimates of the variance are made.**Variance between samples:**An estimate of*σ*2 that is the variance of the sample means multiplied by*n*(when the sample sizes are the same.).**Variance within samples:**An estimate of*σ*2 that is the average of the sample variances (also known as a pooled variance).: the sum of squares that represents the variation among the different samples*SS*between: the sum of squares that represents the variation within samples that is due to chance.*SS*within

*MS***means**: "mean square.": is the variance between groups*MS*between: is the variance within groups.*MS*within

**k**: the number of different groups**nj**: the size of the*jth*group**sj**: the sum of the values in the*jth*group: total number of all the values combined (total sample size: ∑*n**nj*): one value→ ∑*x**x*= ∑*sj***Sum of squares of all values from every group combined**: ∑*x*2**Between-group variability**:*SS*total = ∑*x*2 – (∑𝑥2) / n**Total sum of squares**: ∑*x^*2 – (∑𝑥)^2n / n**Explained variation**: sum of squares representing variation among the different samples→*SS*between = ∑[(𝑠𝑗)^2 / 𝑛𝑗]−(∑𝑠𝑗)^2 / 𝑛**Unexplained variation**: sum of squares representing variation within samples due to chance→ 𝑆𝑆within = 𝑆𝑆total – 𝑆𝑆between**'s for different groups (*df*'s for the numerator)**:*df**df*=*k*– 1*df*within =*n***–***:* Equation for errors within samples (*k**df*'s for the denominator): Mean square (variance estimate) explained by the different groups*MS*between = 𝑆𝑆between / 𝑑𝑓betweenMean square (variance estimate) that is due to chance (unexplained)*MS*within = 𝑆𝑆within / 𝑑𝑓within:**Null hypothesis is true**:*MS*between and*MS*within should both estimate the same value.**The alternate hypothesis**: at least two of the sample groups come from populations with different normal distributions.**The null hypothesis**: all groups are samples from populations having the same normal distribution

𝐹 = 𝑀𝑆between / 𝑀𝑆within

**-Ratio Formula when the groups are the same size:** 𝐹 = 𝑛⋅𝑠𝑥^2 / 𝑠^2 pooled*F*where ...

: the sample size*n*:*df*numerator*k*– 1:*df*denominator*n*–*k*: the mean of the sample variances (pooled variance)*s*2 pooled**sx¯^2**: the variance of the sample means

Here are some facts about the

*F*distribution.The curve is not symmetrical but skewed to the right.

There is a different curve for each set of

*df*s.The

*F*statistic is greater than or equal to zero.As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal.

In order to perform a

*F*test of two variances, it is important that the following are true:The populations from which the two samples are drawn are normally distributed.

The two populations are independent of each other.

*F*has the distribution*F*~*F*(*n*1 – 1,*n*2 – 1)where n1 – 1 are the degrees of freedom for the numerator and n2 – 1 are the degrees of freedom for the denominator.

*F***is close to one**: the evidence favors the null hypothesis (the two population variances are equal)*F***is much larger than one**: then the evidence is against the null hypothesisA test of two variances may be left, right, or two-tailed.

## Examples