Introduction to Fractional Factorial Designs and Half-Fractions
Core Objectives and the Concept of Fractional Designs
- The primary focus of this module is experimental efficiency: learning how to perform significantly less work while obtaining approximately the same amount of information as a full factorial study.
- This approach relies on a combination of educated guessing and specific underlying assumptions about the nature of systems and factor interactions.
- Full Factorial Complexity Recap:
- In a system with k factors where each factor has two levels (low and high), the number of experiments required for a full factorial design is 2k.
- As the number of factors (k) increases, the number of experiments grows exponentially, which often makes full factorial designs prohibitive in practical settings.
- The Key Insight: It is not necessary to run all experiments in a full factorial design. One can run fewer experiments (a subset), provided they are willing to accept certain trade-offs in information quality.
Prohibitive Nature of Full Factorial Designs
- Scalability Issues:
- A 2-factor system (k=2) requires 22=4 experiments. This allows the estimation of 4 parameters: the intercept, two main effects, and one two-factor interaction.
- A 3-factor system (k=3) requires 23=8 experiments, allowing for the estimation of 8 parameters.
- A 4-factor system (k=4) requires 24=16 experiments, allowing for 16 parameters.
- In real-world industrial or research systems, there are often 6, 7, or more factors, leading to a massive volume of experiments that are both time-prohibitive and cost-prohibitive.
- Automation and High-Throughput Limits: Even for systems that can be highly automated—such as DNA sequencing, computer simulations, or software-based experiments—running $2^k$ configurations remains inefficient and often unnecessary.
Assumptions and Higher-Order Interactions
- Coefficient Sparsity: There is very little practical utility in estimating all 2k coefficients. Many of these coefficients represent higher-order interactions.
- The Saliency of Interactions:
- Higher-order interactions (three-factor interactions and above) are frequently non-existent in real physical systems.
- Even if they do exist, their coefficients are usually so small they are practically zero.
- Three-factor interactions: Seldom seen or significant in actual systems.
- Fourth-order interactions and higher: Almost certainly do not exist in practice.
- The Sacrifice of Accuracy: While higher-order terms help to fine-tune model predictions, their inclusion comes at a high cost. In many practical situations, losing the prediction accuracy provided by these terms is acceptable to save time and budget.
Strategic Selection of Experimental Subsets (The Half-Fraction)
- Scenario: Given a budget/time limit for only 4 experiments in a 3-factor system (k=3, which normally requires 8 runs).
- Ineffective Selections:
- Running only the "front face" of a design cube (experiments where factor C is only at the low level) is a poor choice. This provides no data on the high level of C, leaving its effect unknown.
- Selecting the middle four rows of a standard order table is an improvement but still not the optimal strategy.
- The "Best Choice" Design (The Half-Fraction):
- In a cube visualization of a 3-factor system, this involves selecting four points that form a specific geometric pattern (either the four "open circles" or the four "closed circles").
- The Collapse Property:
- This specific set of 4 experiments is chosen because of its robustness to insignificant factors.
- If Factor A is found to be non-significant (via a Pareto plot), it implies that its level (minus or plus) does not affect the outcome.
- By "collapsing" the minus and plus layers of the cube along the A-axis, the remaining 4 experiments form a complete full factorial design for the remaining two factors (B and C).
- This same property applies if B or C are found to be non-significant; the design collapses into a full 22 factorial for the relevant variables.
Case Study: Water Treatment Example and Cost Analysis
- Economic Context:
- In this example, each single experiment is valued at approximately $10,000.
- A full factorial (8 runs) costs $80,000.
- By running only a half-fraction (4 runs), the cost is reduced to $40,000, representing a $40,000 savings.
- Experimental Selection: The software is used to select runs 2, 3, 5, and 8 from the original standard order table.
- Software Analysis and "NA" Results:
- When analyzing only 4 experiments, the software can only estimate 4 coefficients.
- A model attempting to include all interactions will result in "NA" (Not Applicable) for certain terms because there is insufficient data to calculate them.
- Comparison of Models (Full vs. Fractional):
- Intercept and Main Effect A: Numerically similar in both models.
- Main Effect C: Numerically similar.
- Main Effect B: Showed significant difference/error in the fractional model compared to the full model. This discrepancy indicates the risk of information loss when using a reduced design.
Mathematical Construction of the Half-Fraction
- The 2^{k-1} Rule: A half-fraction of a 2k design is denoted as 2k−1.
- Calculation for 3 Factors:
- 23−1=22=4 experiments.
- Procedure for Generating the Table:
- 1. Write out the standard order table for the first two factors (A and B) as if it were a full factorial for 2 factors.
- 2. Generate the third factor (C) using the generator rule: C=AimesB.
- 3. This produces the following levels for C: [+,−,−,+] (the product of the signs in columns A and B).
Screening vs. Optimization Strategies
- Screening:
- The goal is to identify which factors are significant among many possibilities.
- In this phase, reduced knowledge and lower prediction accuracy are acceptable. It is okay if interaction estimates are not perfectly known.
- Optimization:
- Once the significant factors are known, more specific and accurate information is required.
- This phase requires better resolution of main effects and their interactions to find the system's optimum point.
- The George Box Rule of Thumb:
- Approximately 25% of the experimental effort and budget should be invested in the initial experimental designs.
- This ensures that time and resources remain available later to investigate the details and fine-tune models after the most important factors have been identified.
Conclusion and Future Directions
- Reducing work by half results in a loss of specific accuracy, but a "smart" selection of experiments (half-fraction) provides a built-in backup strategy through the collapse property.
- The next session will focus on the formal technical terminology and the mechanics involve in creating these half-fractions systematically.