Ultimate Combo Research Paper

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/117

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

118 Terms

1
New cards

What are adversarial examples?

Minimally perturbed versions of legitimate inputs that cause machine learning models to misclassify them.
• Pixel-level manipulations
• Invisible to human observers
• Cause model misclassification

2
New cards

What is the classic panda-gibbon example?

A correctly classified panda image can be transformed to be classified as a gibbon with 99.3% confidence.
• Appears identical to humans
• Shows severity of vulnerability
• Demonstrates imperceptible perturbations

3
New cards

What is the typical perturbation constraint for adversarial examples on ImageNet?

L-infinity norm ≤ 16/255
• Approximately 6% of pixel value range
• Imperceptible to human vision
• Bounded perturbations

4
New cards

Name three real-world domains where adversarial vulnerabilities raise critical concerns.

• Hospitals
• Financial institutions
• Autonomous vehicles

5
New cards

What physical-world adversarial attack was demonstrated on Tesla?

Printed stickers on stop signs that successfully fooled Tesla's vision system.
• Real-world demonstration
• Not just theoretical
• Shows practical vulnerability

6
New cards

What is the difference between white-box and black-box attacks?

White-box: Adversary has complete knowledge of target model architecture and parameters
Black-box: Adversary has no access to model internals

7
New cards

What is transferability in adversarial machine learning?

The property where adversarial examples crafted to fool one model often fool other models as well.
• Enables black-box attacks
• Key discovery in adversarial ML
• Allows two-step attack strategy

8
New cards

What is the two-step attack strategy enabled by transferability?

Step 1: Create adversarial examples on a surrogate model you control
Step 2: Launch same adversarial examples against black-box target model

9
New cards

What transferability rate did the paper achieve with a 4-model ensemble?

Over 95% transferability
• Remarkably high success rate
• Key finding of the paper
• Against adversarially trained models

10
New cards

Why can't defenders simply update their models to avoid adversarial attacks?

When new models are trained on similar data with similar architecture, transferability often persists.
• Model updates don't break transferability
• Significant challenge for defenders
• Vulnerability persists across updates

11
New cards

What was Gap #1 in prior adversarial research?

Limited scope: only approximately 6 augmentation types were tested.
• Hundreds exist in ML literature
• Massive unexplored space
• Prior work tested tiny fraction

12
New cards

What was Gap #2 in prior adversarial research?

Composition method: previous work used only serial composition.
• Sequential/nested application
• No exploration of alternatives
• Limiting approach

13
New cards

What are the three problems with serial composition?

Problem 1: Exponential explosion in samples
Problem 2: Order matters, creating combinatorial complexity
Problem 3: Image distortion and unrecognizable results

14
New cards

How many combinations result from 3 augmentations with 5 variants each using serial composition?

125 combinations
• 5³ = 125
• Exponential explosion
• Computationally prohibitive

15
New cards

What are the two revolutionary research questions posed by this paper?

Q1: What if we systematically test 46 augmentations instead of 6?
Q2: What if we compose them in parallel rather than serially?

16
New cards

How many augmentation techniques did the paper systematically evaluate?

46 different augmentation techniques
• Many never used for attacks before
• Comprehensive study
• Spans 7 categories

17
New cards

What are the 7 categories of augmentations tested?

  1. Color-space transformations

  2. Random deletion methods

  3. Kernel filters

  4. Mixing methods

  5. Style transfer augmentations

  6. Meta-learning-inspired augmentations

  7. Spatial transformations

18
New cards

Give examples of color-space transformations.

• Grayscale (GS)
• Color Jitter (CJ) - adjusts hue, contrast, saturation, brightness
• Simplest augmentations

19
New cards

What is CutOut and what category does it belong to?

Category: Random deletion
Function: Randomly masks rectangular regions
Purpose: Forces holistic feature focus and robustness to occlusions

20
New cards

What are kernel filters in the context of augmentations?

Classical image processing techniques including:
• Sharpen
• Blur
• Edge enhancement

21
New cards

What is CutMix?

A mixing method that replaces a region within one image with a region from another image.
• Category: Mixing images
• Combines multiple images

22
New cards

What style was used for Neural Transfer augmentation?

Picasso's 1907 self-portrait style
• Applied using generative model
• Preserves semantics, changes style
• Extreme augmentation

23
New cards

What is AutoAugment?

A meta-learning augmentation where a pre-trained controller selects appropriate augmentation methods from a predefined set.
• Algorithm chooses augmentation
• Category: Meta-learning inspired

24
New cards

What insight does the success of extreme augmentations (like Picasso style) reveal?

Deep neural networks rely on surprisingly low-level features:
• Edge information
• Rough color distributions
• Texture patterns
Stylization preserves these features

25
New cards

What base algorithm does the paper's framework build upon?

MI-FGSM: Momentum Iterative Fast Gradient Sign Method
• Proven iterative attack
• Incorporates momentum
• Stable optimization

26
New cards

What are the 5 steps of Algorithm 1 per iteration?

Step 1: Augment - create m augmented versions
Step 2: Calculate gradients for all m versions
Step 3: Average gradients
Step 4: Add momentum for stability
Step 5: Update image in computed direction

27
New cards

How many iterations does the algorithm typically run?

T = 10 iterations typically
• Standard practice
• Balances effectiveness and speed

28
New cards

What is the "master key" perturbation concept?

By averaging gradients across multiple augmented versions, the attack finds a perturbation that works against:
• Original image
• Grayscale version
• Cropped version
• All augmented versions simultaneously

29
New cards

What are the two purposes of momentum in the algorithm?

Purpose 1: Helps optimization escape poor local minima (like a ball rolling over bumps)
Purpose 2: Reduces effect of noisy gradients from individual augmentations

30
New cards

What is the step size (alpha) parameter?

Alpha = epsilon / T
• Uses full perturbation budget
• T is number of iterations
• Balances invisibility with effectiveness

31
New cards

What epsilon value is used for CIFAR-10?

Epsilon = 0.02 to 0.04
• For smaller images
• Lower than ImageNet
• Adjusted for image size

32
New cards

How does parallel composition work?

Apply all augmentations independently to the original image, then aggregate results.
• m samples per iteration (not thousands)
• Each image stays recognizable
• No exponential explosion

33
New cards

Compare sample counts: serial vs parallel for 5 augmentations with 5 parameters.

Serial: 3,125 samples
Parallel: 25 samples
Reduction: Over 2 orders of magnitude

34
New cards

What is the potential concern with parallel composition?

Missing interactions between augmentations.
• Lose compound effects like rotate-then-crop
• However, benefits vastly outweigh this loss

35
New cards

What are the three benefits of parallel composition?

Benefit 1: Can use many more augmentations
Benefit 2: Better gradient signals from clean samples
Benefit 3: Forces true generalization (must work against all simultaneously)

36
New cards

What improvement did parallel composition achieve over serial?

3.4× improvement
• Single largest factor in effectiveness
• Experimentally confirmed
• Parallel > serial

37
New cards

How many combination experiments were conducted?

16,215 distinct experiments
• 1,035 two-way combinations (C(46,2))
• 15,180 three-way combinations (C(46,3))
• Each on 1,000 images
• Days of GPU time

38
New cards

What was Key Finding #1 from manual combination experiments?

Parallel beats serial by 3.4×
• Single most important factor
• Confirmed from exhaustive search

39
New cards

What was Key Finding #2 from manual combination experiments?

Color-space augmentations dominate results.
• Grayscale and Color Jitter appear in ALL top-10 combinations
• Surprising: simplest augmentations most effective

40
New cards

What was Key Finding #3 from manual combination experiments?

Deletion operations prove crucial.
• Random erasing forces robustness to occlusions
• Improves transferability

41
New cards

What was Key Finding #4 from manual combination experiments?

Monotonic trend observed: more augmentations consistently yield better transfer rates.
• Consistent pattern
• Explored in detail later

42
New cards

What pattern was observed in top-performing augmentation combinations?

Successful combinations mix augmentation categories rather than drawing from a single type.
• Example: GS + Deletion + Kernel
• Example: CJ + Deletion + Spatial
• Diversity important

43
New cards

Why do color-space transformations help dramatically? (Reason 1)

Forces shape/texture robustness.
• DNNs over-rely on color for classification
• Learn shortcuts (green things = plants)
• Grayscale forces shape/texture focus

44
New cards

Why do color-space transformations help dramatically? (Reason 2)

Find fundamental features.
• Simple operations, don't distort structure
• Create distribution shifts
• Features persist across color spaces
• More transferable between models

45
New cards

Why is exhaustive search infeasible for 46 augmentations?

• 4-way combinations: 163K possibilities
• 5-way combinations: Over 1M possibilities
• Cannot test all computationally

46
New cards

What optimization technique was used for large combination spaces?

Genetic search
• Well-suited for large, discrete search spaces
• Simulates biological evolution
• Finds near-optimal solutions efficiently

47
New cards

What serves as the fitness function in genetic search?

Transferability of each augmentation combination
• Evaluated on attack success
• Guides selection of "parents"

48
New cards

What are the 6 steps of the genetic algorithm?

Step 1: Initial population (random combinations)
Step 2: Evaluate each (fitness = transferability)
Step 3: Selection (fittest act as "parents")
Step 4: Crossover (combine parent combinations)
Step 5: Mutation (randomly add/remove augmentations)
Step 6: Repeat until convergence

49
New cards

What are the three variants developed?

ULTCOMBBASE: 11 augmentations (best from manual search)
ULTCOMBGEN5: Genetic 5-way search
ULTCOMBGEN: Full genetic search (all 46 augmentations)

50
New cards

How many augmentations did ULTCOMBGEN discover?

33 augmentations
• Highest transferability achieved
• This is the champion method
• Complete list in appendix

51
New cards

How was genetic search validated?

Ran on small search space with exhaustive results available.
• n_gen = 2, population = 20
• Achieved >99% of optimal performance
• Explored only tiny fraction of space

52
New cards

What hyperparameters were used for full genetic search?

• pcross = 60% (crossover probability) • pmutate = 10% (mutation probability)
• p_aug = 55% (initial inclusion probability)

53
New cards

What were the 4 surrogate models used in experiments?

• ResNet-18
• ResNet-50
• ResNet-101
• DenseNet-121

54
New cards

How many target models were tested?

7 held-out target models
• On ImageNet dataset
• Using 1,000 test images

55
New cards

What transfer rate does MI-FGSM baseline achieve?

Approximately 60% transfer rate
• No augmentation
• Basic baseline

56
New cards

What transfer rate does DI-FGSM achieve?

Approximately 70% transfer rate
• Simple augmentation
• Better than baseline

57
New cards

What was the previous state-of-the-art transfer rate?

Approximately 75% transfer rate
• Before this paper
• On normally trained models

58
New cards

What transfer rate did ULTCOMBBASE achieve?

84.9% average transfer rate
• 11 augmentations
• Already strong improvement

59
New cards

What transfer rate did ULTCOMBGEN5 achieve?

88.3% average transfer rate
• Genetic 5-way search
• Very high performance

60
New cards

What transfer rate did ULTCOMBGEN achieve?

88.6% average transfer rate
• 33 augmentations
• Best performance on normally trained models

61
New cards

What is the improvement of ULTCOMBGEN over MI-FGSM baseline?

+18.6% improvement
• Also +13.6% over prior best methods
• Consistent across all 7 targets

62
New cards

Why do attacks work better against dissimilar architectures like VGG and Inception?

By attacking across 30+ augmentations, perturbations exploit fundamental, low-level perceptual features rather than architecture-specific quirks.
• Edges, textures, color patterns
• Universal features transfer better

63
New cards

What do VGG and ResNet both rely on despite different architectures?

Similar early-layer features:
• Edges
• Textures
• Color patterns

64
New cards

What are adversarially trained models?

Models specifically designed to resist adversarial attacks.
• Shown adversarial examples during training
• Learn to classify them correctly
• Industry best practice defense

65
New cards

What transfer rate did previous attacks achieve against adversarially trained models?

Most attacks: Less than 20% transfer (fail catastrophically)
Prior SOTA: 60-70% transfer (considered quite good)

66
New cards

What transfer rate did ULTCOMBBASE achieve against adversarially trained models?

78.9% average transfer rate
• Already exceeds prior best
• Strong performance

67
New cards

What transfer rate did ULTCOMBGEN achieve against adversarially trained models?

91.8% average transfer rate
• Dramatically better than prior work
• Against hardened defenses

68
New cards

What transfer rate did ensemble attacks (4 surrogates) achieve?

95.7% transfer rate
• "Best" defenses fail >95% of time
• Severe implication for defense

69
New cards

What is the statistical significance of the results?

p < 0.01 by paired t-test
• Across all surrogate-target pairs
• Not due to chance
• Highly significant

70
New cards

What are the implications for adversarial training?

Widely considered most effective defense, but proves inadequate against systematic augmentation-based attacks.
• Questions DNN deployment in critical systems
• Need additional defensive layers
• Not standalone solution

71
New cards

What two commercial systems were tested?

• Google Cloud Vision
• Clarifai
Both are production systems used by thousands of companies

72
New cards

What transfer rate did ULTCOMBGEN achieve against Google Cloud Vision?

82.1% fooling rate
• Real-world system
• Processing actual user data

73
New cards

What transfer rate did ULTCOMBGEN achieve against Clarifai?

75.4% fooling rate
• Baseline methods: only 40-50%
• Substantial improvement

74
New cards

What does testing on commercial systems demonstrate?

Attacks work not just on academic benchmarks but on real systems.
• Current commercial robustness insufficient
• Against sophisticated adversaries

75
New cards

What responsible disclosure practices were followed?

• Informed companies before publication
• No specific exploitable vulnerabilities revealed
• Goal: improve security, not enable attacks
• Ethical research practices

76
New cards

Why are companies vulnerable to adversarial attacks?

Trade-offs and costs:
• Balance accuracy vs. robustness
• Robust models: 5-10% lower accuracy on benign data
• Adversarial training computationally expensive
• Robust architectures expensive

77
New cards

What monotonic relationship was discovered?

Between number of augmentations and transferability.
• Each additional augmentation adds ~1-2% transfer rate
• No plateau observed (even at 33)
• Near-linear improvement

78
New cards

What are the only two augmentations that decreased performance?

• Neural Style Transfer (sometimes worse)
• Sharpening (sometimes worse)
All others: more = better or equal

79
New cards

What are the three implications of the monotonic relationship?

Implication 1: Design principle - when in doubt, add more
Implication 2: Diversity matters more than quality of individual augmentations
Implication 3: May have no ceiling - can likely push beyond 33

80
New cards

Why not use 100 augmentations? (3 reasons)

Reason 1: Computational cost (33 augs = 165 samples/iteration, 100 = 500)
Reason 2: Augmentation availability (exhausted practical augs in literature)
Reason 3: Diminishing absolute gains (smaller improvements)

81
New cards

What was Ablation Finding #1?

Composition method matters most.
• Parallel: 88.6% transfer
• Serial: 26.1% transfer
• 3.4× improvement
• Single biggest factor

82
New cards

Why does serial composition fail?

Benign accuracy drops severely:
• Parallel: 82.4%
• Serial: 21%
Severe image distortion means adversarial directions don't generalize

83
New cards

What was Ablation Finding #2?

Number of augmentations matters monotonically.
• 5 augs: ~70% transfer
• 11 augs: ~84% transfer
• 33 augs: ~88% transfer

84
New cards

What was Ablation Finding #3?

Diversity across categories matters.
• Mixed categories: 88.6% transfer
• Single category: 65-75% only
Need: Spatial + Color-space + Deletion

85
New cards

What was Ablation Finding #4?

Optimization details matter less than composition.
• MI-FGSM vs. others: <3% difference
• Momentum helps but not critical
• Framework is flexible

86
New cards

What was Ablation Finding #5?

Some individual augmentations are critical:
• Removing Grayscale: -5.2%
• Removing Deletion: -4.8%
• Removing Spatial: -3.1%
Others more substitutable

87
New cards

What is ULTCOMBBASE as a practical alternative?

11 augmentations (faster than ULTCOMBGEN)
• ~85% transfer vs. ~89% for ULTCOMBGEN
• Speed-accuracy trade-off
• Acceptable for many purposes

88
New cards

What is the key theoretical insight for why augmentations work?

Gradient smoothing.
• Surrogate gradients are noisy and model-specific
• Overfitting to noisy gradients causes poor transfer
• Need gradients that generalize

89
New cards

What is the mechanism of gradient smoothing?

• Each augmentation produces different gradient
• Average m gradients from m augmented versions
• Smooths out model-specific noise
• Smoother gradients transfer better

90
New cards

What mathematical technique was used to prove gradient smoothing?

Adapted from randomized smoothing.
• Augmenting with Gaussian noise bounds Lipschitz constant
• Bounds gradient's maximum rate of change
• Larger sigma yields smoother gradients

91
New cards

What empirical evidence supports gradient smoothing?

Measured cosine similarity of consecutive gradients:
• MI-FGSM: variance = 0.82
• ULTCOMBGEN: variance = 0.31
• 62% reduction in variance

92
New cards

What do gradient heat maps show?

ULTCOMB produces more uniform gradients.
• Focus on robust, transferable features
• Not model-specific artifacts
• Visual confirmation of smoothing

93
New cards

What critical finding shows smoothness is not the only mechanism?

Pure Gaussian noise achieves:
• Highest gradient smoothness
• But slightly lower transferability than ULTCOMBBASE
Smoothness = primary mechanism but NOT the only one

94
New cards

What are speculated additional factors beyond smoothness?

• Semantic preservation
• Frequency characteristics
• Diverse feature targeting
Identifying these = important future work

95
New cards

What is Limitation #1 of the approach?

Speed trade-off.
• ULTCOMBGEN: 40× slower than simple attacks
• Creates 165 samples/iteration
• ~40 seconds per image

96
New cards

What mitigations exist for the speed limitation?

• 30-sample version: 5.5 seconds (~90% performance)
• Offline attacks: success rate matters more than speed
• Implementation not optimized: can be faster
• Not fundamental limitation

97
New cards

What is Limitation #2 of the approach?

Theoretical incompleteness.
• Smoothness explains most but not all effectiveness
• Evidence: Gaussian noise has max smoothness but not max transferability
• Other unknown factors contribute

98
New cards

What is Future Work Direction #1?

Identify additional mechanisms beyond smoothness.
• What do augmentations preserve/enhance?
• Complete the theory
• Deeper investigation needed

99
New cards

What is Future Work Direction #2?

Design better defenses for composition attacks.
• Make augmentation ineffective
• Detection systems
• Adaptive adversarial training
• Use parallel-composed augmentations in training

100
New cards

What is Future Work Direction #3?

Optimize implementation for speed.
• Current: research-quality code
• Engineering effort can reduce runtime
• Practical improvement possible

Explore top flashcards

Shut Up
Updated 964d ago
flashcards Flashcards (65)
hbs 5.1
Updated 953d ago
flashcards Flashcards (61)
CSE111 - Indexes
Updated 385d ago
flashcards Flashcards (52)
The Knee
Updated 89d ago
flashcards Flashcards (33)
Shut Up
Updated 964d ago
flashcards Flashcards (65)
hbs 5.1
Updated 953d ago
flashcards Flashcards (61)
CSE111 - Indexes
Updated 385d ago
flashcards Flashcards (52)
The Knee
Updated 89d ago
flashcards Flashcards (33)