Robot Vision Test 1

0.0(0)
studied byStudied by 1 person
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/46

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

47 Terms

1
New cards

Correct parameter-tuning protocol

Split data into train, validation, and test sets.

Tune on validation, and use the test set only once at the very end to report the final score.

2
New cards

Softmax loss

A function that measures how bad a model’s predictions are in a multi-class classification problem by comparing predicted probabilities to the actual class

3
New cards

Purpose of regularization in deep learning

To prevent the model from memorizing the training data (overfitting), helping it perform better on new, unseen data

4
New cards

Difference between L1 and L2 regularization

L1 regularization pushes some model weights to become exactly zero, effectively selecting important features

L2 Regularization forces weights to be small but barely zero

5
New cards

Common regularization techniques (besides L1/L2)

Dropout (randomly turns off neurons during training)

Batch Normalization (stabilizes and speeds up training)

6
New cards

k-NN vs a linear classifier

k-NN is slower at making predictions, requires no training time, and uses more memory to store the entire training dataset

7
New cards

Numeric vs Analytic gradients

Numeric gradient is easy to implement but is slow and an approximation

Analytic gradient is fast and exact but it’s easier to make coding mistakes

8
New cards

Learning-rate schedules

A pre-defined plan for how the learning rate changes during training, such as gradually decreasing it over time to improve convergence

9
New cards

Normalization Layers (BatchNorm, LayerNorm, InstanceNorm)

Techniques to standardize the inputs to a layer, which helps to speed up and stabilize the training of deep neural networks

10
New cards

Residual connections (ResNet)

A “shortcut” that skips some layers, allowing the model to easily learn to do nothing if a layer is not useful, which helps in training very deep networks

11
New cards

Problem with naive weights intialization

Initializing all weights to very small/large random values can cause the signals (gradients) to shrink/explode, making the network very difficult to train

12
New cards

Limitation of Xavier intialization

It doesn’t work well for networks that use ReLU activation functions, as it can lead to neurons dying (always outputting 0)

13
New cards

Best initialization for ReLU-based networks

Kaiming initialization

Because it is specifically designed to work with the properties of the ReLU activation function

14
New cards

Hinge loss

A loss function used for training classifiers, which aims to ensure that correct predications are made with a confident margin

15
New cards

Effective receptive field of three 3×3 conv layers

The same as a single 7×7 convolution layer

16
New cards

Why stack small 3×3 convolutions?

It uses fewer parameters than a single large kernel (like 7×7) and allows for more non-linear activation functions, making the network more powerful and efficient 

17
New cards

Role of a loss function

To measure how far off the model’s predictions are from the correct answers, guiding the model on how to adjust its weights

18
New cards

Purpose of a non-linear activation function

It allows the neural network to learn complex patterns and relationships in the data that a simple linear model cannot

19
New cards

Benefit of pooling layer in CNNs

It reduces the size of the feature maps, which makes the computation faster and helps the network become more robust to the exact position of objects in an image

20
New cards

Two common types of pooling

Max Pooling (takes maximum value in a window)

Average Pooling (takes average value)

21
New cards

Final layer in a classification CNN

A fully connected layer is typically added at the end to take the high level features learned by the CNN and use them to make the final classification

22
New cards

Weight update rule

An algorithm, like gradient descent, that adjusts the weights of the network in the direction that reduces the loss function

23
New cards

Effect of a large learning rate

It can cause the optimization to overshoot the ideal solution and bounce around, possibly preventing the model from converging

24
New cards

Vanishing gradients

A problem in very deep networks where the gradient becomes extremely small, causing the weights in the early layers to stop updating, effectively halting learning

25
New cards

How to lessen vanishing gradients

Use architectures like ResNet with residual connections, employ proper weight initialization (like Kaiming), and use normalization layers (like BatchNorm)

26
New cards

Softmax Loss/Cross Entropy Loss

Li = - log( esy / Σjesj)

27
New cards

Hinge Loss (Multi class SVM) / (Single)

Li = Σj≠ymax(0, sj - sy + 1)

28
New cards

Hinge Loss Average

L = 1 / N ΣNi=1Li

29
New cards

Total Squared Error

Etotal = Σ ½ (target - output)2

30
New cards

Convolution Layer Parameters (weights)

K × K × Cin × Cout

K: Kernel size

Cin: input channels

Cout: output channels (num of filters)

31
New cards

FC Layer Params

Input size × output size + output size

32
New cards

Sequential Stacking of Convs (ex, three 3×3 layers)

If all layers go from C → C, total weights is:

3×(3×3×C×C)

33
New cards

ResNet (Residual Block Relation)

y = F(x) + x

output = transformation of input + og input

34
New cards

ReLU Activation

a = max(0, z)

z: pre-activation input

35
New cards

Weight Update Rule (Gradient Descent)

Wnew = Wold - η∂W/∂E

η: learning rate

36
New cards

Effective Receptive Field

For a stack of 3×3 convos w stride 1, effective receptive field is 7×7 

(formula for L layers = 1 + L(K - 1), where K = 3 for 3×3)

37
New cards

L1 vs L2 regularization specifics

L1: loss + λ(Σni=1|Wi|)

L2: loss + λ(Σni=1 Wi2)

38
New cards

What is the correct learning-rate schedule statement

Warm-up increases LR linearly from a small value at the start

Exponential decay multiplies the LR by a fixed factor every epoch

Time-based decay reduces LR gradually as num epochs increases

39
New cards

What is the need for normalization layers

Internal Covariate Shift (changing distribution of layer inputs during training) and allow use of higher LR

40
New cards

BatchNorm2D Axes

Computes statistics over

N (batch), H (height), and W (width) axes

Normalizes across

channels C for the entire batch

41
New cards

LayerNorm Axes

Computes statistics over

C (channel), H (height), W (width)

Normalizes

within a single sample

42
New cards

InstanceNorm Axes

Computes statistics over

H (height) and W (width) axes

Normalizes

within a single sample and single channel

43
New cards

Effective receptive field of single 7×7 conv

7×7

44
New cards

Parameter Count:

Three 3×3 stack (C to C)

3×(3×3×C×C) = 27C2

45
New cards

Parameter Count: Single 7×7 conv (C to C)

1×(7×7×C×C) = 49C2

46
New cards

Usefulness of a pooling layer

Reduces feature map size

Lowers computational cost and memory

Provides a degree of translation invariance 

47
New cards

Cause of Vanishing Gradients

Repeated multiplication of small gradients through many layers during backpropagation