Machine Learning - Chapter 10

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/65

flashcard set

Earn XP

Description and Tags

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

66 Terms

1
New cards
What is an artificial neural network (ANN)?
An ANN is a computational model inspired by the human brain, made up of layers of interconnected nodes (neurons) that can learn from data.
2
New cards
What is a neuron in an ANN?
A simple computational unit that receives inputs, multiplies them by weights, adds a bias, and applies an activation function to produce an output.
3
New cards
What is the formula for a single artificial neuron's output?
Output = activation(weighted sum of inputs + bias).
4
New cards
What is an activation function?
A function that introduces nonlinearity into the network, allowing it to learn complex patterns.
5
New cards
Give examples of common activation functions.
Sigmoid, tanh, ReLU, and softmax.
6
New cards
What does ReLU stand for?
Rectified Linear Unit.
7
New cards
Why is ReLU popular?
It helps networks train faster and reduces the vanishing gradient problem compared to sigmoid or tanh.
8
New cards
What is the vanishing gradient problem?
When gradients become too small during backpropagation, preventing deep networks from learning effectively.
9
New cards
How does the Sigmoid activation function work?

How the graph looks:

It's an "S"-shaped curve that starts near 0, rises smoothly, and levels off near 1.

How to understand it:

It smoothly turns numbers into values between 0 and 1, like a dimmer that adjusts brightness instead of just on or off.

Output:

Always between 0 and 1. Useful for binary classification and representing probabilities.

10
New cards
When should you use the Sigmoid activation function?
It's mostly used in the output layer of binary classification problems, where you need to predict a probability between 0 and 1. It's rarely used in hidden layers today because it can cause vanishing gradients.
11
New cards
How does the tanh activation function work?

How the graph looks:

It also looks like an "S", but centered at zero, going from -1 to +1.

How to understand it:

It stretches numbers so negatives become closer to -1 and positives closer to +1, balancing the data around zero.

Output:

Between -1 and 1. Often used in hidden layers to keep activations centered.

12
New cards
When should you use the tanh activation function?
It's often used in hidden layers because it keeps data centered around zero, helping the network learn faster than Sigmoid. However, it can still suffer from the vanishing gradient problem for large input values.
13
New cards
How does the ReLU activation function work?

How the graph looks:

Flat at 0 for negative inputs, then a straight line increasing for positive inputs.

How to understand it:

It's like a door that stays closed (0) for negatives and opens linearly for positives.

Output:

0 for values below zero, and the same as the input for values above zero. Common in hidden layers for speed and simplicity.

14
New cards
When should you use the ReLU activation function?
It's the most common choice for hidden layers in deep neural networks because it avoids vanishing gradients for positive values and is computationally efficient. But it can cause "dead neurons" if inputs stay negative for too long.
15
New cards
How does the Softmax activation function work?

How the graph looks:

It doesn't have one single curve but a set of probabilities that sum to 1, showing which class is most likely.

How to understand it:

It turns raw scores into probabilities — higher numbers mean higher chances for that class.

Output:

A vector of probabilities adding up to 1. Used in output layers for multi-class classification.

16
New cards
When should you use the Softmax activation function?
It's typically used in the output layer of multi-class classification problems, where you need to predict one class out of many. Each neuron's output represents the probability of a specific class.
17
New cards
What is a layer in a neural network?
A collection of neurons that process inputs together and pass their outputs to the next layer.
18
New cards
What are the three main types of layers in an ANN?
Input layer, hidden layers, and output layer.
19
New cards
What is a dense (fully connected) layer?
A layer where every neuron is connected to every neuron in the previous layer.
20
New cards
What is the role of weights in an ANN?
Weights determine the strength of connections between neurons — they are learned during training.
21
New cards
What is the bias term in a neuron?
A constant value added to the weighted sum before applying the activation function, allowing flexibility in the output.
22
New cards
What is the main purpose of training a neural network?
To find the optimal weights and biases that minimize the prediction error on training data.
23
New cards
What is forward propagation?
The process of passing input data through the network to compute predictions.
24
New cards
What is backpropagation?
An algorithm used to compute gradients of the loss function with respect to weights and biases, allowing the model to learn.
25
New cards
What is the loss function in neural networks?
A function that measures how far the model's predictions are from the true target values.
26
New cards
Give examples of loss functions.
Mean Squared Error (MSE) for regression, Binary Cross-Entropy for binary classification, and Categorical Cross-Entropy for multi-class classification.
27
New cards
What is gradient descent?
An optimization algorithm that updates weights step-by-step in the direction that minimizes the loss.
28
New cards
What does the learning rate control?
How big the steps are during gradient descent updates — too high can diverge, too low can be too slow.
29
New cards
What is stochastic gradient descent (SGD)?
A variant of gradient descent that updates weights using one (or a few) training examples at a time, improving efficiency.
30
New cards
What is an epoch in training?
One full pass through the entire training dataset.
31
New cards
What is batch size?
The number of training examples processed before updating the model's weights.
32
New cards
What is overfitting in neural networks?
When a model learns noise and specific details from training data, performing poorly on new data.
33
New cards
What is underfitting in neural networks?
When a model is too simple to capture the underlying patterns in the data.
34
New cards
What are some common regularization techniques?
L1/L2 regularization, dropout, early stopping, and data augmentation.
35
New cards
What is dropout?
A regularization method that randomly disables some neurons during training to prevent overfitting.
36
New cards
What does early stopping do?
It stops training when validation performance stops improving, preventing overfitting.
37
New cards
What is batch normalization?
A technique that normalizes activations in each layer to stabilize and speed up training.
38
New cards
What is the main Keras class used to build models?
The Sequential class for simple linear stacks of layers.
39
New cards
How do you compile a Keras model?
By defining the loss function, optimizer, and metrics using model.compile().
40
New cards
How do you train a Keras model?
By calling model.fit() with training data, epochs, and batch size.
41
New cards
How do you evaluate a Keras model?
By calling model.evaluate() on test or validation data.
42
New cards
How do you make predictions with a Keras model?
By using model.predict() with new input data.
43
New cards
What does Dense(10, activation="relu") mean?
It creates a dense layer with 10 neurons, each using the ReLU activation function.
44
New cards
What is the difference between training and inference?
Training updates model weights, while inference uses the trained weights to make predictions.
45
New cards
Why is scaling input data important for neural networks?
Because features on different scales can cause unstable gradients and slow training.
46
New cards
What is the purpose of the output layer?
It produces the final prediction — often using activation functions like sigmoid or softmax depending on the task.
47
New cards
What is the softmax function used for?
It converts raw output values into probabilities that sum to 1 for multi-class classification.
48
New cards
What is the role of the optimizer?
It updates the network's weights based on the gradients computed during backpropagation.
49
New cards
Name a few optimizers available in Keras.
SGD, Adam, RMSprop, Adagrad, and Adadelta.
50
New cards
Why is the Adam optimizer popular?
Because it adapts learning rates for each parameter, combining the benefits of AdaGrad and RMSprop.
51
New cards
What is the difference between MLP and CNN?
MLP (Multilayer Perceptron) processes flat data; CNN (Convolutional Neural Network) is designed for image data.
52
New cards
What is a perceptron?
The simplest kind of artificial neuron — a linear classifier that outputs 0 or 1 based on a threshold.
53
New cards
Why can't a simple perceptron solve all problems?
Because it can only learn linear decision boundaries, not nonlinear ones like XOR.
54
New cards
What enables neural networks to learn nonlinear functions?
The use of multiple layers and nonlinear activation functions.
55
New cards
What is a hidden layer?
A layer that lies between the input and output layers, enabling the network to learn intermediate representations.
56
New cards
What is a deep neural network?
A neural network with two or more hidden layers.
57
New cards
What is the main advantage of deep networks?
They can automatically learn hierarchical features from raw data.
58
New cards
What is weight initialization?
Setting the initial values of the model's weights before training begins.
59
New cards
Why is proper weight initialization important?
Because bad initialization can cause slow convergence or vanishing/exploding gradients.
60
New cards
What is He initialization?
A weight initialization method designed for ReLU activations that helps maintain signal variance across layers.
61
New cards
What is Xavier (Glorot) initialization?
A method designed to keep the variance of activations constant across layers for tanh and sigmoid activations.
62
New cards
What does "epoch loss decreasing but validation loss increasing" mean?
The model is overfitting — it performs better on training data but worse on unseen data.
63
New cards
Why should you shuffle your data before training?
To prevent the model from learning spurious patterns from the order of the data.
64
New cards
What is the role of callbacks in Keras?
They allow you to execute custom code at specific training stages, like saving checkpoints or stopping early.
65
New cards
Give examples of useful Keras callbacks.
ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, and TensorBoard.
66
New cards
What is TensorBoard?
A visualization tool for tracking training metrics like loss and accuracy over time.