Introduction to Fractional Factorial Designs and Half-Fractions

The primary focus of this module is experimental efficiency: learning how to perform significantly less work while obtaining approximately the same amount of information as a full factorial study.
This approach relies on a combination of educated guessing and specific underlying assumptions about the nature of systems and factor interactions.
Full Factorial Complexity Recap: - In a system with $k$ factors where each factor has two levels (low and high), the number of experiments required for a full factorial design is $2^k$ . - As the number of factors ( $k$ ) increases, the number of experiments grows exponentially, which often makes full factorial designs prohibitive in practical settings.
The Key Insight: It is not necessary to run all experiments in a full factorial design. One can run fewer experiments (a subset), provided they are willing to accept certain trade-offs in information quality.

Scalability Issues: - A 2-factor system ( $k=2$ ) requires $2^2 = 4$ experiments. This allows the estimation of 4 parameters: the intercept, two main effects, and one two-factor interaction. - A 3-factor system ( $k=3$ ) requires $2^3 = 8$ experiments, allowing for the estimation of 8 parameters. - A 4-factor system ( $k=4$ ) requires $2^4 = 16$ experiments, allowing for 16 parameters. - In real-world industrial or research systems, there are often 6, 7, or more factors, leading to a massive volume of experiments that are both time-prohibitive and cost-prohibitive.
Automation and High-Throughput Limits: Even for systems that can be highly automated—such as DNA sequencing, computer simulations, or software-based experiments—running $2^k$ configurations remains inefficient and often unnecessary.

Coefficient Sparsity: There is very little practical utility in estimating all $2^k$ coefficients. Many of these coefficients represent higher-order interactions.
The Saliency of Interactions: - Higher-order interactions (three-factor interactions and above) are frequently non-existent in real physical systems. - Even if they do exist, their coefficients are usually so small they are practically zero. - Three-factor interactions: Seldom seen or significant in actual systems. - Fourth-order interactions and higher: Almost certainly do not exist in practice.
The Sacrifice of Accuracy: While higher-order terms help to fine-tune model predictions, their inclusion comes at a high cost. In many practical situations, losing the prediction accuracy provided by these terms is acceptable to save time and budget.

Scenario: Given a budget/time limit for only 4 experiments in a 3-factor system ( $k=3$ , which normally requires 8 runs).
Ineffective Selections: - Running only the "front face" of a design cube (experiments where factor C is only at the low level) is a poor choice. This provides no data on the high level of C, leaving its effect unknown. - Selecting the middle four rows of a standard order table is an improvement but still not the optimal strategy.
The "Best Choice" Design (The Half-Fraction): - In a cube visualization of a 3-factor system, this involves selecting four points that form a specific geometric pattern (either the four "open circles" or the four "closed circles").
The Collapse Property: - This specific set of 4 experiments is chosen because of its robustness to insignificant factors. - If Factor A is found to be non-significant (via a Pareto plot), it implies that its level (minus or plus) does not affect the outcome. - By "collapsing" the minus and plus layers of the cube along the A-axis, the remaining 4 experiments form a complete full factorial design for the remaining two factors (B and C). - This same property applies if B or C are found to be non-significant; the design collapses into a full $2^2$ factorial for the relevant variables.

Economic Context: - In this example, each single experiment is valued at approximately $\$10,000$ . - A full factorial (8 runs) costs $\$80,000$ . - By running only a half-fraction (4 runs), the cost is reduced to $\$40,000$ , representing a $\$40,000$ savings.
Experimental Selection: The software is used to select runs 2, 3, 5, and 8 from the original standard order table.
Software Analysis and "NA" Results: - When analyzing only 4 experiments, the software can only estimate 4 coefficients. - A model attempting to include all interactions will result in "NA" (Not Applicable) for certain terms because there is insufficient data to calculate them.
Comparison of Models (Full vs. Fractional): - Intercept and Main Effect A: Numerically similar in both models. - Main Effect C: Numerically similar. - Main Effect B: Showed significant difference/error in the fractional model compared to the full model. This discrepancy indicates the risk of information loss when using a reduced design.

The 2^{k-1} Rule: A half-fraction of a $2^k$ design is denoted as $2^{k-1}$ .
Calculation for 3 Factors: - $2^{3-1} = 2^2 = 4$ experiments.
Procedure for Generating the Table: - 1. Write out the standard order table for the first two factors (A and B) as if it were a full factorial for 2 factors. - 2. Generate the third factor (C) using the generator rule: $C = A imes B$ . - 3. This produces the following levels for C: $[+, -, -, +]$ (the product of the signs in columns A and B).

Screening: - The goal is to identify which factors are significant among many possibilities. - In this phase, reduced knowledge and lower prediction accuracy are acceptable. It is okay if interaction estimates are not perfectly known.
Optimization: - Once the significant factors are known, more specific and accurate information is required. - This phase requires better resolution of main effects and their interactions to find the system's optimum point.
The George Box Rule of Thumb: - Approximately $25\%$ of the experimental effort and budget should be invested in the initial experimental designs. - This ensures that time and resources remain available later to investigate the details and fine-tune models after the most important factors have been identified.

Reducing work by half results in a loss of specific accuracy, but a "smart" selection of experiments (half-fraction) provides a built-in backup strategy through the collapse property.
The next session will focus on the formal technical terminology and the mechanics involve in creating these half-fractions systematically.