Design and Analysis of Single Factor Experiments: ANOVA - Part 2

10.1 Completely Randomized Single-Factor Experiments

10.2 The Random-Effects Model

10.2.1 Fixed Versus Random Factors

In many cases, the factor of interest has a large number of possible levels.
The goal is to draw conclusions about the entire population of factor levels.
If we randomly select $a$ levels from the population of factor levels, the factor is considered a random factor.
It is assumed that factor levels come from a population of infinite size or a size large enough to be considered infinite.
This differs from the fixed-effects case, where conclusions apply only to the factor levels used in the experiment, not the entire population.

10.2.2 ANOVA and Variance Components

The linear statistical model is defined as follows: $Y{ij} = \mu + \taui + \epsilon_{ij}$ , where ${i = 1, 2, \cdots , a}$ and ${j = 1, 2, \cdots , n}$ .
- $\tau_i$ represents the treatment effects.
- $\epsilon_{ij}$ represents the errors.
- Treatment effects and errors are independent random variables.
The model's structure is identical to the fixed-effects case, but the parameters have a different interpretation.
If the variance of the treatment effects $\taui$ is ${\sigma\tau}^2$ , then the variance strength of the response is
$V(Y{ij}) = {\sigma\tau}^2 + {\sigma}^2$
The variance components ${\sigma_\tau}^2$ and ${\sigma}^2$ are called variance components.
The model in Eq. (10.19) is called the components of variance model or the random-effects model.
To test hypotheses in the random-effects model:
- The errors $\epsilon_{ij}$ are normally and independently distributed with a mean of zero and variance ${\sigma}^2$ .
- The treatment effects $\taui$ are normally and independently distributed with a mean of zero and variance ${\sigma\tau}^2$ .
- The { $\taui$ } are independent random variables, which implies that the usual assumption of $\sum{i=1}^{n} \tau_i = 0$ from the fixed-effects model does not apply to the random-effects model.
Testing the hypothesis that the individual treatment effects are zero is meaningless for the random-effects model.
It is more appropriate to test hypotheses about ${\sigma_\tau}^2$ .
- $H0: {\sigma\tau}^2 = 0$
- H1: {\sigma\tau}^2 > 0
If ${\sigma_\tau}^2 = 0$ , all treatments are identical.
If {\sigma_\tau}^2 > 0, there is variability between treatments.
The ANOVA decomposition of total variability is still valid:
$SST = SST_{Treatments} + SSE$
The expected values of the mean squares for treatments and error are different than in the fixed-effects case.
In the random-effects model for a single-factor, completely randomized experiment:
$E(MST{Treatments}) = E(\frac{SST{Treatments}}{a-1}) = {\sigma}^2 + n{\sigma_\tau}^2$
$E(MSE) = E(\frac{SSE}{a(n-1)}) = {\sigma}^2$
Both $MSE$ and $MST{Treatments}$ estimate ${\sigma}^2$ when $H0: {\sigma_\tau}^2 = 0$ is true.
$MSE$ and $MST_{Treatments}$ are independent.
The ratio $F0 = \frac{MST{Treatments}}{MSE}$ is an $F$ random variable with $a-1$ and $a(n-1)$ degrees of freedom when $H_0$ is true.
The null hypothesis is rejected at the $\alpha$ -level of significance if the computed test statistic value is f0 > f{\alpha, a-1, a(n-1)}.
The computational procedure and construction of the ANOVA table for the random-effects model are identical to the fixed-effects case.
- However, the conclusions are different because they apply to the entire population of treatments.
We usually want to estimate the variance components ( ${\sigma}^2$ and ${\sigma_\tau}^2$ ) in the model.
The procedure to estimate ${\sigma}^2$ and ${\sigma_\tau}^2$ is called the analysis of variance method, because it uses the information in the ANOVA table.
It does not require the normality assumption on the observations.
The procedure consists of equating the expected mean squares to their observed values in the ANOVA table and solving for the variance components.
When equating observed and expected mean squares in the one-way classification random-effects model, we obtain:
$MST{Treatments} = {\sigma}^2 + n{\sigma\tau}^2$ & $MSE = {\sigma}^2$
Therefore, the estimators of the ANOVA variance components are:
${\hat{\sigma}}^2 = MSE$
${\hat{\sigma\tau}}^2 = \frac{MST{Treatments} - MSE}{n}$
Sometimes, the analysis of variance method produces a negative estimate of a variance component when using the above equations.
- In that case, we set it to zero, re-analyze the model experiment, or assume the linear model does not apply and modify it.

Example 10.4

In textile manufacturing, the undertaking "Design and Analysis of Experiments" is used to study a single-factor experiment involving the random-effects model.
A textile manufacturing company weaves a fabric on a large number of looms.
The company is interested in loom-to-loom variability in tensile strength.
To investigate this variability, the engineer selects four looms ( $a = 4$ treatments) at random and makes four strength determinations ( $n = 4$ samples) on fabric samples chosen at random from each loom.
The data are shown in Table 10.7, and the ANOVA is summarized in Table 10.8.
Data used in the illustrative example of tensile strength is given in Table 10.7.
$y{..} = 1527; N = 16, y{i.} = 390, 366, 383 \& 388$
$a = 1, 2, 3, 4$ : four treatments, $n = 4$ (samples), $N = a \times n = 4 \times 4 = 16$
This example depicts an experiment that is a completely randomized design.
Use the analysis of variance (ANOVA) to test the hypothesis that loom-to-loom mean tensile strength. Use $\alpha = 0.01$ .
$H0: \tau1 = \tau2 = \tau3 = \tau4 = 0$ & $H1: \tau_i \neq 0$ for at least one $i$
The sums of squares for the analysis of variance are computed from Eqs. (10.8), (10.9) & (10.10) as follows:
$SST = \sum{i=1}^{a} \sum{j=1}^{n} y{ij}^2 - \frac{y{..}^2}{N} = \sum{i=1}^{4} \sum{j=1}^{4} y{ij}^2 - \frac{y{..}^2}{N} = \sum{i=1}^{4} y{i1}^2 + y{i2}^2 + y{i3}^2 + y{i4}^2 - \frac{y{..}^2}{N}$
Sum up the squares for row by row or treatment.
$SST = 98^2 + 97^2 + 99^2 + 96^2 + 91^2 + 90^2 + 93^2 + 92^2 + 96^2 + 95^2 + 97^2 + 95^2 + 95^2 + 96^2 + 99^2 + 98^2 - \frac{1527^2}{16}$
$= 38030 + 33494 + 36675 + 37646 - \frac{1527^2}{16} = 145845 - 145733.063$
$SST = 111.9375$
$y{i.} = \text{Totals}= 390, 366, 383, 388 \text{ and } a = 4$ $SST{Treatments} = \sum{i=1}^{a} \frac{y{i.}^2}{n} - \frac{y_{..}^2}{N}$
Thus,
$SST{Treatments} = \sum{i=1}^{4} \frac{y{i.}^2}{n} - \frac{y{..}^2}{N} = \frac{y{1.}^2 + y{2.}^2 + y{3.}^2 + y{4.}^2}{n} - \frac{y{..}^2}{N} = \frac{390^2 + 366^2 + 383^2 + 388^2}{4} - \frac{1527^2}{24}$ $= \frac{583289}{4} - \frac{1527^2}{16} = 89.1875$ $SST{Treatments} = 89.1875$
From Eq. (10.10), we have $SSE = SST - SST_{Treatments}$
$SSE = <a target="_blank" rel="noopener noreferrer nofollow" class="link" href="tel:111.9375" data-prevent-progress="true">111.9375</a> - 89.1875 = 22.7500$
From Eq. (10.7), we calculate the test statistic as $F0 = \frac{SST{Treatments} / a-1}{SSE / a(n-1)} = \frac{MST{Treatments}}{MSE}$ $F0 = \frac{89.1875}{4-1} = 29.7291667 \approx 29.73$
$MSE = \frac{SSE}{a (n-1)} = \frac{22.7500}{4 (4-1)} = 1.89583333 = {\sigma}^2$
$MSE = {\sigma}^2 = 1.89583333$
From Eq. (10.7) we calculate the test statistic as
$f0 = \frac{MST{Treatments}}{MSE} = \frac{29.7291667}{1.89583333} = 15.6813187$
We note that $f_0$ is an $F$ -distribn with $a-1 = 4-1 = 3$ and $a(n-1) = 4(4-1) = 12$ degrees of freedom.
Find $f{0.01,3,12}$ from Table VI ~ $\alpha = 0.01: f{\alpha, a-1, a(n-1)} = f_{0.01,3,12} = 5.95$
Reject $H0$ if f0 = 11.99 > f{\alpha, a-1, a(n-1)} = f{0.01,3,12} = 5.95
$P$ -value for this test statistic (via software) is P = P(F_{3,12} > 15.6813187) = 1.8 \times 10^{-4}
Because $P \simeq 2.33361 \times 10^{-4}$ is considerably smaller than $\alpha = 0.01$ , we have strong evidence to conclude that $H_0$ is not true.
With mean square error, $MSE = {\sigma}^2 = 1.8958333$
$\hat{{\sigma}}^2 = MSE$ , via Eq. (10.25) we have ${\hat{\sigma\tau}}^2 = \frac{MST{Treatments} - MSE}{n} = \frac{29.7291667 - 1.89583333}{4} = 6.95833334$
${\hat{\sigma_\tau}}^2 \cong 6.96$
From the analysis of variance, we conclude that the looms in the plant differ significantly in their ability to produce fabric of uniform strength.
The variance components are estimated via Eqs. (10.24) & (10.25) to be ${\hat{\sigma}}^2 = MSE = 1.8958 \dots \cong 1.90$ & ${\hat{\sigma_\tau}}^2 \cong 6.96$
Therefore, the variance of strength in the manufacturing process is estimated by Eq. (10.19) as $V(Y{ij}) = {\sigma\tau}^2 + {\sigma}^2 \approx 8.86$
Most of the variability in strength in the output product is attributable to differences between looms.
ANOVA for the Tensile Strength of looms via software:
- $SST = <a target="_blank" rel="noopener noreferrer nofollow" class="link" href="tel:111.9375" data-prevent-progress="true">111.9375</a>, SST_{Treatments} = 89.1875$
- $MST_{Treatments} = 29.7291667, MSE = 1.89583333$
- (numerator) $dF = a - 1 = 3$ , (denominator) $dF = a(n - 1) = 12$ , (total) $dF = 15$

Work through exercises 13.34 to 13.41 in section 13.3

*Follow the definitions especially defining expressions (the equations).

*Complicated problems are what engineers solve.

*Most problems will follow examples on how to use the equations.

*Practice, and practice!

10.3 Randomized Complete Block Design (RCBD) Model

10.3.1 Design and Statistical Analysis

In experimental design problems, it is necessary to design the experiment so that the variability arising from a nuisance factor can be controlled.
The paired $t$ -test is a procedure for comparing two treatment means when all experimental runs cannot be made under homogeneous conditions.
The paired t-test is viewed as a method for reducing the background noise in the experiment by blocking out a nuisance factor effect.
The randomized block design is an extension of the paired $t$ -test to situations where the factor of interest has more than two levels; i.e., more than two treatments must be compared.
For example, suppose that three methods could be used to evaluate the strength readings on steel plate girders (Z-like structural elements).
These are considered as three treatments, say $t1, t2$ , and $t_3$ .
If we use four girders as the experimental units, a randomized complete block design (RCBD) would appear as shown in Fig. 10.9.
The design is called a RCBD because:
- Each block is large enough to hold all the treatments.
- The actual assignment of each of the three treatments within each block is done randomly.
Experiment data is recorded in a table, such as is shown in Table 10.9.
The observations in this table, say, $y_{ij}$ , represent the response obtained when method $i$ is used on girder $j$ .
The general procedure for a RCBD consists of selecting $b$ blocks and running a complete replicate of the experiment in each block.
The data that result from running a RCBD for investigating a single factor with $a$ levels and $b$ blocks are shown in Table 10.10.
There are $a$ observations (one per factor level) in each block.
The order in which these observations are run is randomly assigned within the block.
The statistical analysis for the RCBD:
- Suppose that a single factor with $a$ levels is of interest and that the experiment is run in $b$ blocks.
- The observations are here represented by the linear statistical model: $Y{ij} = \mu + \taui + \betaj + \epsilon{ij}$ , where ${i = 1, 2, \cdots , a}$ and ${j = 1, 2, \cdots , b}$
 - $\mu$ is an overall mean.
 - $\tau_i$ is the effect of the $i$ th treatment.
 - $\beta_j$ is the effect of the $j$ th block.
 - $\epsilon_{ij}$ is the random error term, which is assumed to be normally and independently distributed with mean zero and variance ${\sigma}^2$ .
- Furthermore, the treatment and block effects are defined as deviations from the overall mean, so $\sum{i=1}^{a} \taui = 0$ and $\sum{j=1}^{b} \betaj = 0$ .
- This was the same type of definition used for completely randomized experiments.
- It is assumed that treatments and blocks do not interact.
- The effect of treatment $i$ is the same regardless of which block (or blocks) in which it is tested.
- It is of interest to test the equality of the treatment effects.
 $H0: \tau1 = \tau2 = \cdots = \taua$
 $H1: \taui \neq 0 \text{ for at least one }$
The analysis of variance can be extended to the RCBD.
The procedure uses a sum of squares identity that partitions the total sum of squares into three components.
ANOVA Sums of Squares Identity: Randomized Complete Block Experiment
The sum of squares identity for the randomized complete block design is:
$\sum{i=1}^{a} \sum{j=1}^{b} (y{ij} - \bar{y}{..})^2 = b\sum{i=1}^{a} (\bar{y}{i.} - \bar{y}{..})^2 + a\sum{j=1}^{b} (\bar{y}{.j} - \bar{y}{..})^2 + \sum{i=1}^{a} \sum{j=1}^{b} (y{ij} - \bar{y}{.j} - \bar{y}{i.} + \bar{y}{..})^2$
$SST = SST{Treatments} + SSB{Blocks} + SSE$
The degrees of freedom corresponding to these sums of squares are:
$ab - 1 = a - 1 + b - 1 + (a - 1)(b - 1)$
For the randomized block design, the relevant mean squares are:
$MST{Treatments} = \frac{SST{Treatments}}{a - 1}$
$MSB{Blocks} = \frac{SSB{Blocks}}{b - 1}$
$MSE = \frac{SSE}{(a - 1)(b - 1)}$
The expected values of these mean squares can be shown to be as follows:
$E(MST{Treatments}) = {\sigma}^2 + \frac{b \sum{i=1}^{a} {\taui}^2}{a - 1}$ $E(MSB{Blocks}) = {\sigma}^2 + \frac{a \sum{i=1}^{b} {\betaj}^2}{a - 1}$
$E(MSE) = {\sigma}^2$
If the null hypothesis $H0$ is true, so that all treatment effects $\taui = 0$ , $MST_{Treatments}$ is an unbiased estimator of ${\sigma}^2$ .
If $H0$ is false, $MST{Treatments}$ overestimates ${\sigma}^2$ .
The mean square for error is always an unbiased estimate of ${\sigma}^2$ .
To test the null hypothesis that the treatment effects are all zero, we use the ratio
$F0 = \frac{MST{Treatments}}{MSE}$
$F_0$ has an $F$ -distribn with $a - 1$ and $(a - 1)(b - 1)$ degrees of freedom if the null hypothesis is true.
Reject the null hypothesis at the $\alpha$ -level of significance if the computed value of the test statistic is f0 > f{\alpha, a-1, (a-1)(b-1)}.
Compute $SST, SST{Treatments}$ and $SSB{Blocks}$ to obtain the error sum of squares $SSE$ by subtraction.
Computing Formulas for ANOVA: Randomized Block Experiment (RCBD)
$SST = \sum{i=1}^{a} \sum{j=1}^{b} y{ij}^2 - \frac{y{..}^2}{ab}$
$SST{Treatments} = \frac{1}{b} \sum{i=1}^{a} y{i.}^2 - \frac{y{..}^2}{ab}$
$SSB{Blocks} = \frac{1}{a} \sum{j=1}^{b} y{j.}^2 - \frac{y{..}^2}{ab}$
$SSE = SST - SST{Treatments} - SSB{Blocks}$
The computations are usually arranged in an ANOVA table.
Computer software is used to perform the analysis of variance for a RCBD because it is easier at data handling.

Example 10.5

An experiment was performed to determine the effect of four different chemicals on the strength of a fabric.
The chemicals are used as part of the permanent-press finishing process.
Five fabric samples were selected, and a RCBD was run by testing each chemical type once in random order on each fabric sample.
It is required to test for differences in means using an ANOVA with $\alpha = 0.01$ significance level.
Data analysis and interpretation:
- $y{..} = 39.2 ; \bar{y}{..} = 1.96$
- $y{i.} = 5.7, 8.8, 6.9 \& 17.8; \bar{y}{i.} = 1.14, 1.76, 1.38 \& 3.56$
- $y_{.j} = 9.2, 10.1, 3.5, 8.8 \& 7.6$
- $\bar{y}_{.j} = 2.3, 2.525, 0.875, 2.2 \& 1.9$
- $N = 20, a = 1, 2, 3, 4$ : four treatments; $b = 5$ (blocks, samples), $N = a \times b = 4 \times 5 = 20$
- $dF: a - 1 = 3$ (numerator) & $(a - 1)(b - 1) = 12$ (denominator)
This example depicts an experiment that is a completely randomized design.
Use the analysis of variance (ANOVA) to test the hypothesis on the means using $\alpha = 0.01$ .
$H0: \tau1 = \tau2 = \tau3 = \tau4 = 0$ $H1: \tau_i \neq 0 \text{ for at least one } i$
The sums of squares for the analysis of variance are computed from:
$SST = \sum{i=1}^{4} \sum{j=1}^{5} y{ij}^2 - \frac{y{..}^2}{ab}$
Thus,
$SST = 1.3^2 + 1.6^2 + 0.5^2 + 1.2^2 + 1.1^2 + 2.2^2 + 2.4^2 + 0.4^2 + 2.0^2 + 1.8^2 + 1.8^2 + 1.7^2 + 0.6^2 + 1.5^2 + 1.3^2 + 3.9^2 + 4.4^2 + 2.0^2 + 4.1^2 + 3.4^2 - \frac{39.2^2}{20}$
$= 7.15 + 18.00 + 10.43 + 66.94 - \frac{39.2^2}{20}$
$SST = 102.52 - 76.832 = 25.688$
- $y{i.} = \text{Totals}= 5.7, 8.8, 6.9, 17.8$ $SST{Treatments} = \sum{i=1}^{a} \frac{y{i.}^2}{b} - \frac{y{..}^2}{ab}$ $SST{Treatments} = \sum{i=1}^{4} \frac{y{i.}^2}{b} - \frac{y{..}^2}{ab} = \frac{y{1.}^2 + y{2.}^2 + y{3.}^2 + y{4.}^2}{5} - \frac{y{..}^2}{ab} = \frac{5.7^2 + 8.8^2 + 6.9^2 + 17.8^2}{5} - \frac{39.2^2}{20}$
 $= \frac{474.38}{5} - \frac{39.2^2}{20} = 18.044$
 $SST_{Treatments} = 18.044$
With $y_{.j} =9.2, 10.1, 3.5, 8.8 \& 7.6$
$SSB{Blocks} = \sum{j=1}^{5} \frac{y{.j}^2}{a} - \frac{y{..}^2}{ab} = \frac{y{.1}^2 + y{.2}^2 + y{.3}^2 + y{.4}^2 + y{.5}^2}{5} - \frac{y{..}^2}{ab} = \frac{9.2^2 + 10.1^2 + 3.5^2 + 8.8^2 + 7.6^2}{4} - \frac{39.2^2}{20}$
$= \frac{334.10}{4} - \frac{39.2^2}{20} = 6.693$
$SSB_{Blocks} = 6.693$
$SSE = SST - SSB{Blocks} - SST{Treatments} = 25.688 - 6.693 - 18.044 = 0.951$
From Eq. (10.7) we calculate the test statistic:
$F0 = \frac{SST{Treatments} / a-1}{SSE / (a-1)(b-1)} = \frac{MST{Treatments}}{MSE}$ $F0 = \frac{18.044}{ 4 - 1} = \frac{0.951}{(4 - 1) (5 - 1)} = 75.89484753$
Note that $f_0$ is an $F$ -distribn with $a - 1 = 4 - 1 = 3$ and $a - 1 b - 1 = (4 - 1) (5 - 1) = 12$ degrees of freedom
Find $f{0.01,3,12}$ from Table VI ~ $\alpha = 0.01: f{\alpha, a-1, a(n-1)} = f_{0.01,3,12} = 5.95$
Reject $H0$ if f0 = 75.89 > f{\alpha, a-1, a(n-1)} = f{0.01,3,12} = 5.95 where $f0$ is the computed value of $F0$ from Eq. (10.7).
Find a $P$ -value for this test statistic:
P = P(F_{3,12} > 75.89) = 4.5 \times 10^{-8}
Because $P \simeq 4. 5 \times 10^{-8}$ (calculator) is considerably smaller than $\alpha = 0.01$ , we conclude that there is a significant difference in the chemical types so far as their effect on strength is concerned.
MathML
$MST{Treatments} = \frac{SST{Treatments}}{a - 1} = \frac{18.044}{4 - 1} = 6.0147$
$MSE = \frac{SSE}{(a - 1)(b - 1)} = \frac{0.951}{12} = 0.07925$
$MSB{Blocks} = \frac{SSB{Blocks}}{b - 1} = \frac{6.693}{5 - 1} = 1.67325$
*Practice problems at the end of section 13.4 in the textbook.

Work through exercises 13.42 to 13.69 in section 13.2.