7 - Common network architecture and Transfer Learning

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/62

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 3:47 PM on 5/19/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

63 Terms

1
New cards

What is object classification?

Predicting a single class label for the entire image.

2
New cards

What is single‑object localisation?

Predicting one object’s class and its bounding box.

3
New cards

What is object detection?

Predicting multiple objects, each with a class label and bounding box

4
New cards

What are the key features of AlexNet?

·        First deep CNN to win ImageNet

·        ReLU activations

·        Dropout

·        GPU training

·        5 conv layers + 3 FC layers

5
New cards

What characterises VGG networks?

·        Very deep (16–19 layers)

·        Only 3×3 convolutions

·        Simple, uniform architecture

·        Large number of parameters

6
New cards

What characterises GoogLeNet?

·        Uses Inception modules

·        Parallel 1×1, 3×3, 5×5 convs + pooling

·        1×1 convs for dimensionality reduction

·        Much fewer parameters than VGG

7
New cards

What characterises ResNet?

·        Uses residual (skip) connections

·        Enables very deep networks (50–152 layers)

·        Solves vanishing gradient problem

8
New cards

What does the Inception module do?

Processes input at multiple scales using parallel convs and pooling.

9
New cards

What is the structure of an Inception module?

·        1×1 conv

·        3×3 conv

·        5×5 conv

·        Max pooling

Outputs concatenated.

10
New cards

Why does the Inception module work well?

·        Multi‑scale feature extraction

·        1×1 conv reduces computation

·        Efficient and expressive

11
New cards

What is a residual module?

A block where the input is added to the output: y = F(x) + x

12
New cards

Why use residual connections?

·        Prevent vanishing gradients

·        Make optimisation easier

·        Enable very deep networks

13
New cards

What does a residual block learn?

The residual (difference) rather than the full mapping.

14
New cards

What is the Fast R‑CNN pipeline?

1.        Compute shared conv feature map

2.        Region proposals (Selective Search)

3.        RoI Pooling

4.        FC layers

5.        Outputs: class scores + bounding box regression

15
New cards
16
New cards

Why is Fast R‑CNN faster than R‑CNN?

The image is processed once, not per region

17
New cards

Why use 1×1 convolutions in Inception?

Dimensionality reduction → fewer parameters → cheaper computation

18
New cards

What problem do skip connections solve?

Vanishing gradients in deep networks

19
New cards

Which architecture uses multi‑scale processing?

GoogLeNet (Inception)

20
New cards

Which architecture emphasises depth?

VGG and ResNet

21
New cards

Which architecture emphasises width/multi‑branching?

GoogLeNet

22
New cards

Top-1 Accuracy

model only correct if the highest probability class matches the ground truth

23
New cards

Top-5 Accuracy

model is correct if ground truth among the top five canditates

24
New cards

What is the ImageNet dataset?

A large‑scale dataset with ~1.2M training images and 1000 classes, used for benchmarking and pretraining.

25
New cards

Why is ImageNet widely used?

It enables transfer learning and provides a standard benchmark for comparing models.

26
New cards

What is Top‑5 error?

Top‑5 Error = 1−Top‑5 Accuracy

27
New cards

What is global average pooling?

A layer that averages each feature map into a single value, replacing fully‑connected layers.

28
New cards

Why is GAP used in CNNs?

Reduces parameters, prevents overfitting, and simplifies architecture

29
New cards

What are auxiliary classifiers in GoogLeNet?

Extra classifiers attached to intermediate layers to help gradient flow and regularise training

30
New cards

Are auxiliary classifiers used during inference?

No — only the main classifier is used

31
New cards

What is naive Inception?

Parallel 1×1, 3×3, 5×5 convs + pooling, without dimensionality reduction

32
New cards

Why is naive Inception inefficient?

3×3 and 5×5 convolutions are computationally expensive

33
New cards

How does reduced Inception improve efficiency?

Uses 1×1 convolutions to reduce channel depth before expensive convolutions.

34
New cards

Why do 1×1 convolutions reduce cost?

They reduce the number of input channels, lowering the number of multiplications

35
New cards

Why are 5×5 convolutions expensive?

Cost scales with K² ; 5×5 has 25× more multiplications than 1×1.

36
New cards

What is the Fréchet Inception Distance?

A metric that compares real and generated images by measuring the distance between their feature distributions.

37
New cards

What does FID evaluate?

Image quality and diversity in generative models.

38
New cards

How is FID computed?

Extract features using an Inception network, then compute the Fréchet distance between real and generated feature distributions.

39
New cards

How FID works

  • Pass real + generated images through Inception‑v3

  • Extract 2048‑dimensional features (GAP layer)

  • Compute distance between their means + covariances

  • Lower FID = better quality + diversity

40
New cards

Why FID is used

·        Captures realism

·        Captures diversity

·        Better than pixel‑wise metrics

41
New cards

How to design a deep learning method

-            Preprocessing

-            Architecture

-            Training Details

42
New cards

Image localisation

-            Uses IOU threshold

-            Usually 0.5

43
New cards

Mean average precision (mAP)

Area under precision recall curve, precision/recall computed using both correct label and bounding box

44
New cards

Intersection over union

Area of overlap divided by area of union (the whole area)

45
New cards

Precision

TP / (TP+FP)

46
New cards

Recall

TP / (TP+FN)

47
New cards

What does a lower FID score indicate?

More realistic and diverse generated images.

48
New cards

Why deeper networks failed before ResNet

·        Vanishing/exploding gradients (solved by normalisation)

·        Degradation problem: deeper networks had worse training accuracy

·        Not due to overfitting — optimisation difficulty

49
New cards

Why can’t a normal conv+ReLU block learn identity?

ReLU outputs zero for negative inputs → cannot reproduce x for x < 0

50
New cards

What is the key idea of ResNet?

Learn the residual F(x) = H(x) - x

Then output H(x)

51
New cards

Skip connection purpose

·        Added to Inception 4a and 4d

·        Weighted at 0.3 in the loss

·        Provide extra gradient signal

·        Reduce vanishing gradients

·        Act as regularisers

·        Removed at inference

52
New cards

Naive Inception

·        Parallel branches: 1×1, 3×3, 5×5 convs + pooling

·        All padded to preserve spatial size

·        Outputs concatenated

·        MaxPool uses 3×3, stride 1, padding 1

53
New cards

Reduced Inception

·        Insert 1×1 conv before 3×3 and 5×5 convs

·        Reduces channel depth → reduces computation

·        Makes Inception efficient

54
New cards

How to train a pre-trained model

-            Freeze weights of intermediate layer and train the last one

-            Train all layers

-            Hybrid, last layer trained first, followed by unfreezing intermediate layers

55
New cards

When to freeze all and just train last

·        You have very little data

·        Your new task is similar to ImageNet

56
New cards

When to train all layers

·        You have lots of data

·        Your task is very different from ImageNet

·        You can afford longer training

57
New cards

When to do hybrid approach

·        You have a moderate amount of data

·        Your task is somewhat different from ImageNet

58
New cards

What is transfer learning?

Using a model pretrained on a large dataset (e.g., ImageNet) and adapting it to a new task with limited data

59
New cards

Why ImageNet models transfer well

Early layers learn general features (edges, textures), which are useful for many tasks

60
New cards

Why use Transfer Learning?

·        Faster training

·        Better performance with small datasets

·        Pretrained models learn general features (edges, textures, shapes)

·        Reduces overfitting

61
New cards

When transfer learning is most useful

When the target dataset is small or expensive to label

62
New cards

Residual module diagram

63
New cards

Inception vs naïve inception