Introduction to Fractional Factorial Designs and Half-Fractions

Core Objectives and the Concept of Fractional Designs

  • The primary focus of this module is experimental efficiency: learning how to perform significantly less work while obtaining approximately the same amount of information as a full factorial study.
  • This approach relies on a combination of educated guessing and specific underlying assumptions about the nature of systems and factor interactions.
  • Full Factorial Complexity Recap:     - In a system with kk factors where each factor has two levels (low and high), the number of experiments required for a full factorial design is 2k2^k.     - As the number of factors (kk) increases, the number of experiments grows exponentially, which often makes full factorial designs prohibitive in practical settings.
  • The Key Insight: It is not necessary to run all experiments in a full factorial design. One can run fewer experiments (a subset), provided they are willing to accept certain trade-offs in information quality.

Prohibitive Nature of Full Factorial Designs

  • Scalability Issues:     - A 2-factor system (k=2k=2) requires 22=42^2 = 4 experiments. This allows the estimation of 4 parameters: the intercept, two main effects, and one two-factor interaction.     - A 3-factor system (k=3k=3) requires 23=82^3 = 8 experiments, allowing for the estimation of 8 parameters.     - A 4-factor system (k=4k=4) requires 24=162^4 = 16 experiments, allowing for 16 parameters.     - In real-world industrial or research systems, there are often 6, 7, or more factors, leading to a massive volume of experiments that are both time-prohibitive and cost-prohibitive.
  • Automation and High-Throughput Limits: Even for systems that can be highly automated—such as DNA sequencing, computer simulations, or software-based experiments—running $2^k$ configurations remains inefficient and often unnecessary.

Assumptions and Higher-Order Interactions

  • Coefficient Sparsity: There is very little practical utility in estimating all 2k2^k coefficients. Many of these coefficients represent higher-order interactions.
  • The Saliency of Interactions:     - Higher-order interactions (three-factor interactions and above) are frequently non-existent in real physical systems.     - Even if they do exist, their coefficients are usually so small they are practically zero.     - Three-factor interactions: Seldom seen or significant in actual systems.     - Fourth-order interactions and higher: Almost certainly do not exist in practice.
  • The Sacrifice of Accuracy: While higher-order terms help to fine-tune model predictions, their inclusion comes at a high cost. In many practical situations, losing the prediction accuracy provided by these terms is acceptable to save time and budget.

Strategic Selection of Experimental Subsets (The Half-Fraction)

  • Scenario: Given a budget/time limit for only 4 experiments in a 3-factor system (k=3k=3, which normally requires 8 runs).
  • Ineffective Selections:     - Running only the "front face" of a design cube (experiments where factor C is only at the low level) is a poor choice. This provides no data on the high level of C, leaving its effect unknown.     - Selecting the middle four rows of a standard order table is an improvement but still not the optimal strategy.
  • The "Best Choice" Design (The Half-Fraction):     - In a cube visualization of a 3-factor system, this involves selecting four points that form a specific geometric pattern (either the four "open circles" or the four "closed circles").
  • The Collapse Property:     - This specific set of 4 experiments is chosen because of its robustness to insignificant factors.     - If Factor A is found to be non-significant (via a Pareto plot), it implies that its level (minus or plus) does not affect the outcome.     - By "collapsing" the minus and plus layers of the cube along the A-axis, the remaining 4 experiments form a complete full factorial design for the remaining two factors (B and C).     - This same property applies if B or C are found to be non-significant; the design collapses into a full 222^2 factorial for the relevant variables.

Case Study: Water Treatment Example and Cost Analysis

  • Economic Context:     - In this example, each single experiment is valued at approximately $10,000\$10,000.     - A full factorial (8 runs) costs $80,000\$80,000.     - By running only a half-fraction (4 runs), the cost is reduced to $40,000\$40,000, representing a $40,000\$40,000 savings.
  • Experimental Selection: The software is used to select runs 2, 3, 5, and 8 from the original standard order table.
  • Software Analysis and "NA" Results:     - When analyzing only 4 experiments, the software can only estimate 4 coefficients.     - A model attempting to include all interactions will result in "NA" (Not Applicable) for certain terms because there is insufficient data to calculate them.
  • Comparison of Models (Full vs. Fractional):     - Intercept and Main Effect A: Numerically similar in both models.     - Main Effect C: Numerically similar.     - Main Effect B: Showed significant difference/error in the fractional model compared to the full model. This discrepancy indicates the risk of information loss when using a reduced design.

Mathematical Construction of the Half-Fraction

  • The 2^{k-1} Rule: A half-fraction of a 2k2^k design is denoted as 2k12^{k-1}.
  • Calculation for 3 Factors:     - 231=22=42^{3-1} = 2^2 = 4 experiments.
  • Procedure for Generating the Table:     - 1. Write out the standard order table for the first two factors (A and B) as if it were a full factorial for 2 factors.     - 2. Generate the third factor (C) using the generator rule: C=AimesBC = A imes B.     - 3. This produces the following levels for C: [+,,,+][+, -, -, +] (the product of the signs in columns A and B).

Screening vs. Optimization Strategies

  • Screening:     - The goal is to identify which factors are significant among many possibilities.     - In this phase, reduced knowledge and lower prediction accuracy are acceptable. It is okay if interaction estimates are not perfectly known.
  • Optimization:     - Once the significant factors are known, more specific and accurate information is required.     - This phase requires better resolution of main effects and their interactions to find the system's optimum point.
  • The George Box Rule of Thumb:     - Approximately 25%25\% of the experimental effort and budget should be invested in the initial experimental designs.     - This ensures that time and resources remain available later to investigate the details and fine-tune models after the most important factors have been identified.

Conclusion and Future Directions

  • Reducing work by half results in a loss of specific accuracy, but a "smart" selection of experiments (half-fraction) provides a built-in backup strategy through the collapse property.
  • The next session will focus on the formal technical terminology and the mechanics involve in creating these half-fractions systematically.