Ultimate Combo Research Paper

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/117

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

118 Terms

New cards

What are adversarial examples?

Minimally perturbed versions of legitimate inputs that cause machine learning models to misclassify them.
• Pixel-level manipulations
• Invisible to human observers
• Cause model misclassification

New cards

What is the classic panda-gibbon example?

A correctly classified panda image can be transformed to be classified as a gibbon with 99.3% confidence.
• Appears identical to humans
• Shows severity of vulnerability
• Demonstrates imperceptible perturbations

New cards

What is the typical perturbation constraint for adversarial examples on ImageNet?

L-infinity norm ≤ 16/255
• Approximately 6% of pixel value range
• Imperceptible to human vision
• Bounded perturbations

New cards

Name three real-world domains where adversarial vulnerabilities raise critical concerns.

• Hospitals
• Financial institutions
• Autonomous vehicles

New cards

What physical-world adversarial attack was demonstrated on Tesla?

Printed stickers on stop signs that successfully fooled Tesla's vision system.
• Real-world demonstration
• Not just theoretical
• Shows practical vulnerability

New cards

What is the difference between white-box and black-box attacks?

White-box: Adversary has complete knowledge of target model architecture and parameters
Black-box: Adversary has no access to model internals

New cards

What is transferability in adversarial machine learning?

The property where adversarial examples crafted to fool one model often fool other models as well.
• Enables black-box attacks
• Key discovery in adversarial ML
• Allows two-step attack strategy

New cards

What is the two-step attack strategy enabled by transferability?

Step 1: Create adversarial examples on a surrogate model you control
Step 2: Launch same adversarial examples against black-box target model

New cards

What transferability rate did the paper achieve with a 4-model ensemble?

Over 95% transferability
• Remarkably high success rate
• Key finding of the paper
• Against adversarially trained models

New cards

Why can't defenders simply update their models to avoid adversarial attacks?

When new models are trained on similar data with similar architecture, transferability often persists.
• Model updates don't break transferability
• Significant challenge for defenders
• Vulnerability persists across updates

New cards

What was Gap #1 in prior adversarial research?

Limited scope: only approximately 6 augmentation types were tested.
• Hundreds exist in ML literature
• Massive unexplored space
• Prior work tested tiny fraction

New cards

What was Gap #2 in prior adversarial research?

Composition method: previous work used only serial composition.
• Sequential/nested application
• No exploration of alternatives
• Limiting approach

New cards

What are the three problems with serial composition?

Problem 1: Exponential explosion in samples
Problem 2: Order matters, creating combinatorial complexity
Problem 3: Image distortion and unrecognizable results

New cards

How many combinations result from 3 augmentations with 5 variants each using serial composition?

125 combinations
• 5³ = 125
• Exponential explosion
• Computationally prohibitive

New cards

What are the two revolutionary research questions posed by this paper?

Q1: What if we systematically test 46 augmentations instead of 6?
Q2: What if we compose them in parallel rather than serially?

New cards

How many augmentation techniques did the paper systematically evaluate?

46 different augmentation techniques
• Many never used for attacks before
• Comprehensive study
• Spans 7 categories

New cards

What are the 7 categories of augmentations tested?

Color-space transformations
Random deletion methods
Kernel filters
Mixing methods
Style transfer augmentations
Meta-learning-inspired augmentations
Spatial transformations

New cards

Give examples of color-space transformations.

• Grayscale (GS)
• Color Jitter (CJ) - adjusts hue, contrast, saturation, brightness
• Simplest augmentations

New cards

What is CutOut and what category does it belong to?

Category: Random deletion
Function: Randomly masks rectangular regions
Purpose: Forces holistic feature focus and robustness to occlusions

New cards

What are kernel filters in the context of augmentations?

Classical image processing techniques including:
• Sharpen
• Blur
• Edge enhancement

New cards

What is CutMix?

A mixing method that replaces a region within one image with a region from another image.
• Category: Mixing images
• Combines multiple images

New cards

What style was used for Neural Transfer augmentation?

Picasso's 1907 self-portrait style
• Applied using generative model
• Preserves semantics, changes style
• Extreme augmentation

New cards

What is AutoAugment?

A meta-learning augmentation where a pre-trained controller selects appropriate augmentation methods from a predefined set.
• Algorithm chooses augmentation
• Category: Meta-learning inspired

New cards

What insight does the success of extreme augmentations (like Picasso style) reveal?

Deep neural networks rely on surprisingly low-level features:
• Edge information
• Rough color distributions
• Texture patterns
Stylization preserves these features

New cards

What base algorithm does the paper's framework build upon?

MI-FGSM: Momentum Iterative Fast Gradient Sign Method
• Proven iterative attack
• Incorporates momentum
• Stable optimization

New cards

What are the 5 steps of Algorithm 1 per iteration?

Step 1: Augment - create m augmented versions
Step 2: Calculate gradients for all m versions
Step 3: Average gradients
Step 4: Add momentum for stability
Step 5: Update image in computed direction

New cards

How many iterations does the algorithm typically run?

T = 10 iterations typically
• Standard practice
• Balances effectiveness and speed

New cards

What is the "master key" perturbation concept?

By averaging gradients across multiple augmented versions, the attack finds a perturbation that works against:
• Original image
• Grayscale version
• Cropped version
• All augmented versions simultaneously

New cards

What are the two purposes of momentum in the algorithm?

Purpose 1: Helps optimization escape poor local minima (like a ball rolling over bumps)
Purpose 2: Reduces effect of noisy gradients from individual augmentations

New cards

What is the step size (alpha) parameter?

Alpha = epsilon / T
• Uses full perturbation budget
• T is number of iterations
• Balances invisibility with effectiveness

New cards

What epsilon value is used for CIFAR-10?

Epsilon = 0.02 to 0.04
• For smaller images
• Lower than ImageNet
• Adjusted for image size

New cards

How does parallel composition work?

Apply all augmentations independently to the original image, then aggregate results.
• m samples per iteration (not thousands)
• Each image stays recognizable
• No exponential explosion

New cards

Compare sample counts: serial vs parallel for 5 augmentations with 5 parameters.

Serial: 3,125 samples
Parallel: 25 samples
Reduction: Over 2 orders of magnitude

New cards

What is the potential concern with parallel composition?

Missing interactions between augmentations.
• Lose compound effects like rotate-then-crop
• However, benefits vastly outweigh this loss

New cards

What are the three benefits of parallel composition?

Benefit 1: Can use many more augmentations
Benefit 2: Better gradient signals from clean samples
Benefit 3: Forces true generalization (must work against all simultaneously)

New cards

What improvement did parallel composition achieve over serial?

3.4× improvement
• Single largest factor in effectiveness
• Experimentally confirmed
• Parallel > serial

New cards

How many combination experiments were conducted?

16,215 distinct experiments
• 1,035 two-way combinations (C(46,2))
• 15,180 three-way combinations (C(46,3))
• Each on 1,000 images
• Days of GPU time

New cards

What was Key Finding #1 from manual combination experiments?

Parallel beats serial by 3.4×
• Single most important factor
• Confirmed from exhaustive search

New cards

What was Key Finding #2 from manual combination experiments?

Color-space augmentations dominate results.
• Grayscale and Color Jitter appear in ALL top-10 combinations
• Surprising: simplest augmentations most effective

New cards

What was Key Finding #3 from manual combination experiments?

Deletion operations prove crucial.
• Random erasing forces robustness to occlusions
• Improves transferability

New cards

What was Key Finding #4 from manual combination experiments?

Monotonic trend observed: more augmentations consistently yield better transfer rates.
• Consistent pattern
• Explored in detail later

New cards

What pattern was observed in top-performing augmentation combinations?

Successful combinations mix augmentation categories rather than drawing from a single type.
• Example: GS + Deletion + Kernel
• Example: CJ + Deletion + Spatial
• Diversity important

New cards

Why do color-space transformations help dramatically? (Reason 1)

Forces shape/texture robustness.
• DNNs over-rely on color for classification
• Learn shortcuts (green things = plants)
• Grayscale forces shape/texture focus

New cards

Why do color-space transformations help dramatically? (Reason 2)

Find fundamental features.
• Simple operations, don't distort structure
• Create distribution shifts
• Features persist across color spaces
• More transferable between models

New cards

Why is exhaustive search infeasible for 46 augmentations?

• 4-way combinations: 163K possibilities
• 5-way combinations: Over 1M possibilities
• Cannot test all computationally

New cards

What optimization technique was used for large combination spaces?

Genetic search
• Well-suited for large, discrete search spaces
• Simulates biological evolution
• Finds near-optimal solutions efficiently

New cards

What serves as the fitness function in genetic search?

Transferability of each augmentation combination
• Evaluated on attack success
• Guides selection of "parents"

New cards

What are the 6 steps of the genetic algorithm?

Step 1: Initial population (random combinations)
Step 2: Evaluate each (fitness = transferability)
Step 3: Selection (fittest act as "parents")
Step 4: Crossover (combine parent combinations)
Step 5: Mutation (randomly add/remove augmentations)
Step 6: Repeat until convergence

New cards

What are the three variants developed?

ULTCOMBBASE: 11 augmentations (best from manual search)
ULTCOMBGEN5: Genetic 5-way search
ULTCOMBGEN: Full genetic search (all 46 augmentations)

New cards

How many augmentations did ULTCOMBGEN discover?

33 augmentations
• Highest transferability achieved
• This is the champion method
• Complete list in appendix

New cards

How was genetic search validated?

Ran on small search space with exhaustive results available.
• n_gen = 2, population = 20
• Achieved >99% of optimal performance
• Explored only tiny fraction of space

New cards

What hyperparameters were used for full genetic search?

• pcross = 60% (crossover probability) • pmutate = 10% (mutation probability)
• p_aug = 55% (initial inclusion probability)

New cards

What were the 4 surrogate models used in experiments?

• ResNet-18
• ResNet-50
• ResNet-101
• DenseNet-121

New cards

How many target models were tested?

7 held-out target models
• On ImageNet dataset
• Using 1,000 test images

New cards

What transfer rate does MI-FGSM baseline achieve?

Approximately 60% transfer rate
• No augmentation
• Basic baseline

New cards

What transfer rate does DI-FGSM achieve?

Approximately 70% transfer rate
• Simple augmentation
• Better than baseline

New cards

What was the previous state-of-the-art transfer rate?

Approximately 75% transfer rate
• Before this paper
• On normally trained models

New cards

What transfer rate did ULTCOMBBASE achieve?

84.9% average transfer rate
• 11 augmentations
• Already strong improvement

New cards

What transfer rate did ULTCOMBGEN5 achieve?

88.3% average transfer rate
• Genetic 5-way search
• Very high performance

New cards

What transfer rate did ULTCOMBGEN achieve?

88.6% average transfer rate
• 33 augmentations
• Best performance on normally trained models

New cards

What is the improvement of ULTCOMBGEN over MI-FGSM baseline?

+18.6% improvement
• Also +13.6% over prior best methods
• Consistent across all 7 targets

New cards

Why do attacks work better against dissimilar architectures like VGG and Inception?

By attacking across 30+ augmentations, perturbations exploit fundamental, low-level perceptual features rather than architecture-specific quirks.
• Edges, textures, color patterns
• Universal features transfer better

New cards

What do VGG and ResNet both rely on despite different architectures?

Similar early-layer features:
• Edges
• Textures
• Color patterns

New cards

What are adversarially trained models?

Models specifically designed to resist adversarial attacks.
• Shown adversarial examples during training
• Learn to classify them correctly
• Industry best practice defense

New cards

What transfer rate did previous attacks achieve against adversarially trained models?

Most attacks: Less than 20% transfer (fail catastrophically)
Prior SOTA: 60-70% transfer (considered quite good)

New cards

What transfer rate did ULTCOMBBASE achieve against adversarially trained models?

78.9% average transfer rate
• Already exceeds prior best
• Strong performance

New cards

What transfer rate did ULTCOMBGEN achieve against adversarially trained models?

91.8% average transfer rate
• Dramatically better than prior work
• Against hardened defenses

New cards

What transfer rate did ensemble attacks (4 surrogates) achieve?

95.7% transfer rate
• "Best" defenses fail >95% of time
• Severe implication for defense

New cards

What is the statistical significance of the results?

p < 0.01 by paired t-test
• Across all surrogate-target pairs
• Not due to chance
• Highly significant

New cards

What are the implications for adversarial training?

Widely considered most effective defense, but proves inadequate against systematic augmentation-based attacks.
• Questions DNN deployment in critical systems
• Need additional defensive layers
• Not standalone solution

New cards

What two commercial systems were tested?

• Google Cloud Vision
• Clarifai
Both are production systems used by thousands of companies

New cards

What transfer rate did ULTCOMBGEN achieve against Google Cloud Vision?

82.1% fooling rate
• Real-world system
• Processing actual user data

New cards

What transfer rate did ULTCOMBGEN achieve against Clarifai?

75.4% fooling rate
• Baseline methods: only 40-50%
• Substantial improvement

New cards

What does testing on commercial systems demonstrate?

Attacks work not just on academic benchmarks but on real systems.
• Current commercial robustness insufficient
• Against sophisticated adversaries

New cards

What responsible disclosure practices were followed?

• Informed companies before publication
• No specific exploitable vulnerabilities revealed
• Goal: improve security, not enable attacks
• Ethical research practices

New cards

Why are companies vulnerable to adversarial attacks?

Trade-offs and costs:
• Balance accuracy vs. robustness
• Robust models: 5-10% lower accuracy on benign data
• Adversarial training computationally expensive
• Robust architectures expensive

New cards

What monotonic relationship was discovered?

Between number of augmentations and transferability.
• Each additional augmentation adds ~1-2% transfer rate
• No plateau observed (even at 33)
• Near-linear improvement

New cards

What are the only two augmentations that decreased performance?

• Neural Style Transfer (sometimes worse)
• Sharpening (sometimes worse)
All others: more = better or equal

New cards

What are the three implications of the monotonic relationship?

Implication 1: Design principle - when in doubt, add more
Implication 2: Diversity matters more than quality of individual augmentations
Implication 3: May have no ceiling - can likely push beyond 33

New cards

Why not use 100 augmentations? (3 reasons)

Reason 1: Computational cost (33 augs = 165 samples/iteration, 100 = 500)
Reason 2: Augmentation availability (exhausted practical augs in literature)
Reason 3: Diminishing absolute gains (smaller improvements)

New cards

What was Ablation Finding #1?

Composition method matters most.
• Parallel: 88.6% transfer
• Serial: 26.1% transfer
• 3.4× improvement
• Single biggest factor

New cards

Why does serial composition fail?

Benign accuracy drops severely:
• Parallel: 82.4%
• Serial: 21%
Severe image distortion means adversarial directions don't generalize

New cards

What was Ablation Finding #2?

Number of augmentations matters monotonically.
• 5 augs: ~70% transfer
• 11 augs: ~84% transfer
• 33 augs: ~88% transfer

New cards

What was Ablation Finding #3?

Diversity across categories matters.
• Mixed categories: 88.6% transfer
• Single category: 65-75% only
Need: Spatial + Color-space + Deletion

New cards

What was Ablation Finding #4?

Optimization details matter less than composition.
• MI-FGSM vs. others: <3% difference
• Momentum helps but not critical
• Framework is flexible

New cards

What was Ablation Finding #5?

Some individual augmentations are critical:
• Removing Grayscale: -5.2%
• Removing Deletion: -4.8%
• Removing Spatial: -3.1%
Others more substitutable

New cards

What is ULTCOMBBASE as a practical alternative?

11 augmentations (faster than ULTCOMBGEN)
• ~85% transfer vs. ~89% for ULTCOMBGEN
• Speed-accuracy trade-off
• Acceptable for many purposes

New cards

What is the key theoretical insight for why augmentations work?

Gradient smoothing.
• Surrogate gradients are noisy and model-specific
• Overfitting to noisy gradients causes poor transfer
• Need gradients that generalize

New cards

What is the mechanism of gradient smoothing?

• Each augmentation produces different gradient
• Average m gradients from m augmented versions
• Smooths out model-specific noise
• Smoother gradients transfer better

New cards

What mathematical technique was used to prove gradient smoothing?

Adapted from randomized smoothing.
• Augmenting with Gaussian noise bounds Lipschitz constant
• Bounds gradient's maximum rate of change
• Larger sigma yields smoother gradients

New cards

What empirical evidence supports gradient smoothing?

Measured cosine similarity of consecutive gradients:
• MI-FGSM: variance = 0.82
• ULTCOMBGEN: variance = 0.31
• 62% reduction in variance

New cards

What do gradient heat maps show?

ULTCOMB produces more uniform gradients.
• Focus on robust, transferable features
• Not model-specific artifacts
• Visual confirmation of smoothing

New cards

What critical finding shows smoothness is not the only mechanism?

Pure Gaussian noise achieves:
• Highest gradient smoothness
• But slightly lower transferability than ULTCOMBBASE
Smoothness = primary mechanism but NOT the only one

New cards

What are speculated additional factors beyond smoothness?

• Semantic preservation
• Frequency characteristics
• Diverse feature targeting
Identifying these = important future work

New cards

What is Limitation #1 of the approach?

Speed trade-off.
• ULTCOMBGEN: 40× slower than simple attacks
• Creates 165 samples/iteration
• ~40 seconds per image

New cards

What mitigations exist for the speed limitation?

• 30-sample version: 5.5 seconds (~90% performance)
• Offline attacks: success rate matters more than speed
• Implementation not optimized: can be faster
• Not fundamental limitation

New cards

What is Limitation #2 of the approach?

Theoretical incompleteness.
• Smoothness explains most but not all effectiveness
• Evidence: Gaussian noise has max smoothness but not max transferability
• Other unknown factors contribute

New cards

What is Future Work Direction #1?

Identify additional mechanisms beyond smoothness.
• What do augmentations preserve/enhance?
• Complete the theory
• Deeper investigation needed

New cards

What is Future Work Direction #2?

Design better defenses for composition attacks.
• Make augmentation ineffective
• Detection systems
• Adaptive adversarial training
• Use parallel-composed augmentations in training

100

New cards

What is Future Work Direction #3?

Optimize implementation for speed.
• Current: research-quality code
• Engineering effort can reduce runtime
• Practical improvement possible