knowt ap exam guide logo

Chapter 13: F Distribution and One-way Anova

Introductory

  • Analysis of Variance: For hypothesis tests comparing averages between more than two groups

13.1 One-Way ANOVA

  • ANOVA Test: determine the existence of a statistically significant difference among several group means.

  • Variances: helps determine if the means are equal or not

  • ANOVA Conditions

    • Each population from which a sample is taken is assumed to be normal.

    • All samples are randomly selected and independent.

    • The populations are assumed to have equal standard deviations (or variances).

    • The factor is a categorical variable.

    • The response is a numerical variable.

  • Ho: μ1 = μ2 = μ3 = ... = μk

  • Ha: At least two of the group means μ1, μ2, μ3, ..., μk are not equal. That is, μi ≠ μj for some i ≠ j.

  • The null hypothesis: is simply that all the group population means are the same.

  • The alternative hypothesis: is that at least one pair of means is different.

  • Ho is true: All means are the same; the differences are due to random variation.

  • Ho is NOT true: All means are not the same; the differences are too large to be due to random variation.

13.2 The F Distribution and the F-Ratio

  • F-distribution: theoretical distribution that compares two populations

  • There are two sets of degrees of freedom; one for the numerator and one for the denominator.

  • To calculate the F ratio, two estimates of the variance are made.

    1. Variance between samples: An estimate of σ2 that is the variance of the sample means multiplied by n (when the sample sizes are the same.).

    2. Variance within samples: An estimate of σ2 that is the average of the sample variances (also known as a pooled variance).

      1. SSbetween: the sum of squares that represents the variation among the different samples

      2. SSwithin: the sum of squares that represents the variation within samples that is due to chance.

  • MS means: "mean square."

  • MSbetween: is the variance between groups

  • MSwithin: is the variance within groups.

Calculation of Sum of Squares and Mean Square

  • k: the number of different groups

  • nj: the size of the jth group

  • sj: the sum of the values in the jth group

  • n: total number of all the values combined (total sample size: ∑nj)

  • x: one value→ ∑x = ∑sj

  • Sum of squares of all values from every group combined: ∑x2

  • Between-group variability: SStotal = ∑x2 – (∑𝑥2) / n

  • Total sum of squares: ∑*x^*2 – (∑𝑥)^2n / n

  • Explained variation: sum of squares representing variation among the different samples→ SSbetween = ∑[(𝑠𝑗)^2 / 𝑛𝑗]−(∑𝑠𝑗)^2 / 𝑛

  • Unexplained variation: sum of squares representing variation within samples due to chance→ 𝑆𝑆within = 𝑆𝑆total – 𝑆𝑆between

  • df**'s for different groups (df's for the numerator)**: df = k – 1

  • dfwithin = n k*:* Equation for errors within samples (df's for the denominator)

  • MSbetween = 𝑆𝑆between / 𝑑𝑓between: Mean square (variance estimate) explained by the different groups

  • MSwithin = 𝑆𝑆within / 𝑑𝑓within: Mean square (variance estimate) that is due to chance (unexplained)

  • Null hypothesis is true: MSbetween and MSwithin should both estimate the same value.

  • The alternate hypothesis: at least two of the sample groups come from populations with different normal distributions.

  • The null hypothesis: all groups are samples from populations having the same normal distribution

F-Ratio or F Statistic

  • 𝐹 = 𝑀𝑆between / 𝑀𝑆within

  • F**-Ratio Formula when the groups are the same size:** 𝐹 = 𝑛⋅𝑠𝑥^2 / 𝑠^2 pooled

    • where ...

      • n: the sample size

      • dfnumerator: k – 1

      • dfdenominator: nk

      • s2 pooled: the mean of the sample variances (pooled variance)

      • sx¯^2: the variance of the sample means

13.2 Facts About the F Distribution

  • Here are some facts about the F distribution.

    • The curve is not symmetrical but skewed to the right.

    • There is a different curve for each set of dfs.

    • The F statistic is greater than or equal to zero.

    • As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal.

13.4 Test of Two Variances

  • In order to perform a F test of two variances, it is important that the following are true:

    • The populations from which the two samples are drawn are normally distributed.

    • The two populations are independent of each other.

F ratio

  • F has the distribution F ~ F(n1 – 1, n2 – 1)

  • where n1 – 1 are the degrees of freedom for the numerator and n2 – 1 are the degrees of freedom for the denominator.

  • F is close to one: the evidence favors the null hypothesis (the two population variances are equal)

  • F is much larger than one: then the evidence is against the null hypothesis

  • A test of two variances may be left, right, or two-tailed.

    Examples

Chapter 13: F Distribution and One-way Anova

Introductory

  • Analysis of Variance: For hypothesis tests comparing averages between more than two groups

13.1 One-Way ANOVA

  • ANOVA Test: determine the existence of a statistically significant difference among several group means.

  • Variances: helps determine if the means are equal or not

  • ANOVA Conditions

    • Each population from which a sample is taken is assumed to be normal.

    • All samples are randomly selected and independent.

    • The populations are assumed to have equal standard deviations (or variances).

    • The factor is a categorical variable.

    • The response is a numerical variable.

  • Ho: μ1 = μ2 = μ3 = ... = μk

  • Ha: At least two of the group means μ1, μ2, μ3, ..., μk are not equal. That is, μi ≠ μj for some i ≠ j.

  • The null hypothesis: is simply that all the group population means are the same.

  • The alternative hypothesis: is that at least one pair of means is different.

  • Ho is true: All means are the same; the differences are due to random variation.

  • Ho is NOT true: All means are not the same; the differences are too large to be due to random variation.

13.2 The F Distribution and the F-Ratio

  • F-distribution: theoretical distribution that compares two populations

  • There are two sets of degrees of freedom; one for the numerator and one for the denominator.

  • To calculate the F ratio, two estimates of the variance are made.

    1. Variance between samples: An estimate of σ2 that is the variance of the sample means multiplied by n (when the sample sizes are the same.).

    2. Variance within samples: An estimate of σ2 that is the average of the sample variances (also known as a pooled variance).

      1. SSbetween: the sum of squares that represents the variation among the different samples

      2. SSwithin: the sum of squares that represents the variation within samples that is due to chance.

  • MS means: "mean square."

  • MSbetween: is the variance between groups

  • MSwithin: is the variance within groups.

Calculation of Sum of Squares and Mean Square

  • k: the number of different groups

  • nj: the size of the jth group

  • sj: the sum of the values in the jth group

  • n: total number of all the values combined (total sample size: ∑nj)

  • x: one value→ ∑x = ∑sj

  • Sum of squares of all values from every group combined: ∑x2

  • Between-group variability: SStotal = ∑x2 – (∑𝑥2) / n

  • Total sum of squares: ∑*x^*2 – (∑𝑥)^2n / n

  • Explained variation: sum of squares representing variation among the different samples→ SSbetween = ∑[(𝑠𝑗)^2 / 𝑛𝑗]−(∑𝑠𝑗)^2 / 𝑛

  • Unexplained variation: sum of squares representing variation within samples due to chance→ 𝑆𝑆within = 𝑆𝑆total – 𝑆𝑆between

  • df**'s for different groups (df's for the numerator)**: df = k – 1

  • dfwithin = n k*:* Equation for errors within samples (df's for the denominator)

  • MSbetween = 𝑆𝑆between / 𝑑𝑓between: Mean square (variance estimate) explained by the different groups

  • MSwithin = 𝑆𝑆within / 𝑑𝑓within: Mean square (variance estimate) that is due to chance (unexplained)

  • Null hypothesis is true: MSbetween and MSwithin should both estimate the same value.

  • The alternate hypothesis: at least two of the sample groups come from populations with different normal distributions.

  • The null hypothesis: all groups are samples from populations having the same normal distribution

F-Ratio or F Statistic

  • 𝐹 = 𝑀𝑆between / 𝑀𝑆within

  • F**-Ratio Formula when the groups are the same size:** 𝐹 = 𝑛⋅𝑠𝑥^2 / 𝑠^2 pooled

    • where ...

      • n: the sample size

      • dfnumerator: k – 1

      • dfdenominator: nk

      • s2 pooled: the mean of the sample variances (pooled variance)

      • sx¯^2: the variance of the sample means

13.2 Facts About the F Distribution

  • Here are some facts about the F distribution.

    • The curve is not symmetrical but skewed to the right.

    • There is a different curve for each set of dfs.

    • The F statistic is greater than or equal to zero.

    • As the degrees of freedom for the numerator and for the denominator get larger, the curve approximates the normal.

13.4 Test of Two Variances

  • In order to perform a F test of two variances, it is important that the following are true:

    • The populations from which the two samples are drawn are normally distributed.

    • The two populations are independent of each other.

F ratio

  • F has the distribution F ~ F(n1 – 1, n2 – 1)

  • where n1 – 1 are the degrees of freedom for the numerator and n2 – 1 are the degrees of freedom for the denominator.

  • F is close to one: the evidence favors the null hypothesis (the two population variances are equal)

  • F is much larger than one: then the evidence is against the null hypothesis

  • A test of two variances may be left, right, or two-tailed.

    Examples

robot