Machine Learning - Chapter 10

0.0(0)

Studied by 0 people

0.0(0)

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/65

Earn XP

Description and Tags

#machinelearning

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

66 Terms

New cards

What is an artificial neural network (ANN)?

An ANN is a computational model inspired by the human brain, made up of layers of interconnected nodes (neurons) that can learn from data.

New cards

What is a neuron in an ANN?

A simple computational unit that receives inputs, multiplies them by weights, adds a bias, and applies an activation function to produce an output.

New cards

What is the formula for a single artificial neuron's output?

Output = activation(weighted sum of inputs + bias).

New cards

What is an activation function?

A function that introduces nonlinearity into the network, allowing it to learn complex patterns.

New cards

Give examples of common activation functions.

Sigmoid, tanh, ReLU, and softmax.

New cards

What does ReLU stand for?

Rectified Linear Unit.

New cards

Why is ReLU popular?

It helps networks train faster and reduces the vanishing gradient problem compared to sigmoid or tanh.

New cards

What is the vanishing gradient problem?

When gradients become too small during backpropagation, preventing deep networks from learning effectively.

New cards

How does the Sigmoid activation function work?

How the graph looks:

It's an "S"-shaped curve that starts near 0, rises smoothly, and levels off near 1.

How to understand it:

It smoothly turns numbers into values between 0 and 1, like a dimmer that adjusts brightness instead of just on or off.

Output:

Always between 0 and 1. Useful for binary classification and representing probabilities.

New cards

When should you use the Sigmoid activation function?

It's mostly used in the output layer of binary classification problems, where you need to predict a probability between 0 and 1. It's rarely used in hidden layers today because it can cause vanishing gradients.

New cards

How does the tanh activation function work?

How the graph looks:

It also looks like an "S", but centered at zero, going from -1 to +1.

How to understand it:

It stretches numbers so negatives become closer to -1 and positives closer to +1, balancing the data around zero.

Output:

Between -1 and 1. Often used in hidden layers to keep activations centered.

New cards

When should you use the tanh activation function?

It's often used in hidden layers because it keeps data centered around zero, helping the network learn faster than Sigmoid. However, it can still suffer from the vanishing gradient problem for large input values.

New cards

How does the ReLU activation function work?

How the graph looks:

Flat at 0 for negative inputs, then a straight line increasing for positive inputs.

How to understand it:

It's like a door that stays closed (0) for negatives and opens linearly for positives.

Output:

0 for values below zero, and the same as the input for values above zero. Common in hidden layers for speed and simplicity.

New cards

When should you use the ReLU activation function?

It's the most common choice for hidden layers in deep neural networks because it avoids vanishing gradients for positive values and is computationally efficient. But it can cause "dead neurons" if inputs stay negative for too long.

New cards

How does the Softmax activation function work?

How the graph looks:

It doesn't have one single curve but a set of probabilities that sum to 1, showing which class is most likely.

How to understand it:

It turns raw scores into probabilities — higher numbers mean higher chances for that class.

Output:

A vector of probabilities adding up to 1. Used in output layers for multi-class classification.

New cards

When should you use the Softmax activation function?

It's typically used in the output layer of multi-class classification problems, where you need to predict one class out of many. Each neuron's output represents the probability of a specific class.

New cards

What is a layer in a neural network?

A collection of neurons that process inputs together and pass their outputs to the next layer.

New cards

What are the three main types of layers in an ANN?

Input layer, hidden layers, and output layer.

New cards

What is a dense (fully connected) layer?

A layer where every neuron is connected to every neuron in the previous layer.

New cards

What is the role of weights in an ANN?

Weights determine the strength of connections between neurons — they are learned during training.

New cards

What is the bias term in a neuron?

A constant value added to the weighted sum before applying the activation function, allowing flexibility in the output.

New cards

What is the main purpose of training a neural network?

To find the optimal weights and biases that minimize the prediction error on training data.

New cards

What is forward propagation?

The process of passing input data through the network to compute predictions.

New cards

What is backpropagation?

An algorithm used to compute gradients of the loss function with respect to weights and biases, allowing the model to learn.

New cards

What is the loss function in neural networks?

A function that measures how far the model's predictions are from the true target values.

New cards

Give examples of loss functions.

Mean Squared Error (MSE) for regression, Binary Cross-Entropy for binary classification, and Categorical Cross-Entropy for multi-class classification.

New cards

What is gradient descent?

An optimization algorithm that updates weights step-by-step in the direction that minimizes the loss.

New cards

What does the learning rate control?

How big the steps are during gradient descent updates — too high can diverge, too low can be too slow.

New cards

What is stochastic gradient descent (SGD)?

A variant of gradient descent that updates weights using one (or a few) training examples at a time, improving efficiency.

New cards

What is an epoch in training?

One full pass through the entire training dataset.

New cards

What is batch size?

The number of training examples processed before updating the model's weights.

New cards

What is overfitting in neural networks?

When a model learns noise and specific details from training data, performing poorly on new data.

New cards

What is underfitting in neural networks?

When a model is too simple to capture the underlying patterns in the data.

New cards

What are some common regularization techniques?

L1/L2 regularization, dropout, early stopping, and data augmentation.

New cards

What is dropout?

A regularization method that randomly disables some neurons during training to prevent overfitting.

New cards

What does early stopping do?

It stops training when validation performance stops improving, preventing overfitting.

New cards

What is batch normalization?

A technique that normalizes activations in each layer to stabilize and speed up training.

New cards

What is the main Keras class used to build models?

The Sequential class for simple linear stacks of layers.

New cards

How do you compile a Keras model?

By defining the loss function, optimizer, and metrics using model.compile().

New cards

How do you train a Keras model?

By calling model.fit() with training data, epochs, and batch size.

New cards

How do you evaluate a Keras model?

By calling model.evaluate() on test or validation data.

New cards

How do you make predictions with a Keras model?

By using model.predict() with new input data.

New cards

What does Dense(10, activation="relu") mean?

It creates a dense layer with 10 neurons, each using the ReLU activation function.

New cards

What is the difference between training and inference?

Training updates model weights, while inference uses the trained weights to make predictions.

New cards

Why is scaling input data important for neural networks?

Because features on different scales can cause unstable gradients and slow training.

New cards

What is the purpose of the output layer?

It produces the final prediction — often using activation functions like sigmoid or softmax depending on the task.

New cards

What is the softmax function used for?

It converts raw output values into probabilities that sum to 1 for multi-class classification.

New cards

What is the role of the optimizer?

It updates the network's weights based on the gradients computed during backpropagation.

New cards

Name a few optimizers available in Keras.

SGD, Adam, RMSprop, Adagrad, and Adadelta.

New cards

Why is the Adam optimizer popular?

Because it adapts learning rates for each parameter, combining the benefits of AdaGrad and RMSprop.

New cards

What is the difference between MLP and CNN?

MLP (Multilayer Perceptron) processes flat data; CNN (Convolutional Neural Network) is designed for image data.

New cards

What is a perceptron?

The simplest kind of artificial neuron — a linear classifier that outputs 0 or 1 based on a threshold.

New cards

Why can't a simple perceptron solve all problems?

Because it can only learn linear decision boundaries, not nonlinear ones like XOR.

New cards

What enables neural networks to learn nonlinear functions?

The use of multiple layers and nonlinear activation functions.

New cards

What is a hidden layer?

A layer that lies between the input and output layers, enabling the network to learn intermediate representations.

New cards

What is a deep neural network?

A neural network with two or more hidden layers.

New cards

What is the main advantage of deep networks?

They can automatically learn hierarchical features from raw data.

New cards

What is weight initialization?

Setting the initial values of the model's weights before training begins.

New cards

Why is proper weight initialization important?

Because bad initialization can cause slow convergence or vanishing/exploding gradients.

New cards

What is He initialization?

A weight initialization method designed for ReLU activations that helps maintain signal variance across layers.

New cards

What is Xavier (Glorot) initialization?

A method designed to keep the variance of activations constant across layers for tanh and sigmoid activations.

New cards

What does "epoch loss decreasing but validation loss increasing" mean?

The model is overfitting — it performs better on training data but worse on unseen data.

New cards

Why should you shuffle your data before training?

To prevent the model from learning spurious patterns from the order of the data.

New cards

What is the role of callbacks in Keras?

They allow you to execute custom code at specific training stages, like saving checkpoints or stopping early.

New cards

Give examples of useful Keras callbacks.

ModelCheckpoint, EarlyStopping, ReduceLROnPlateau, and TensorBoard.

New cards

What is TensorBoard?

A visualization tool for tracking training metrics like loss and accuracy over time.