1/30
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is a multilayer perceptron?
A "plain vanilla" neural network — the foundational type of network before specializations like CNNs or LSTMs
What is a Convolutional Neural Network good for?
Image recognition
What is an Long Short-Term Memory good for?
Speech recognition
what is a neuron?
A function that takes inputs, computes a weighted sum plus bias, passes it through an activation function, and outputs a number
What are hidden layers?
All layers between the input and output layer
Why are layers useful?
They break the problem into smaller more manageable pieces
What is a weight?
A value representing the strength of the connection between two neurons
What is a bias?
A value that shifts the threshold for when a neuron activates
What does a large negative bias mean for a neuron?
The neuron needs a strong input signal to fire — it is biased toward inactivity
What does a large positive bias mean for a neuron?
The neuron fires easily even with weak inputs
What does a neuron compute before the activation function?
(input1 × weight1) + (input2 × weight2) + … + bias
What is learning in a neural network?
Finding the right weights and biases
What does an activation function do?
Squishes the raw weighted sum into a usable range and adds nonlinearity
What happens if you stack layers with no activation function?
Any chain of linear operations collapses into a single linear operation — the network behaves like one layer and can only learn straight-line relationships
Why can a network without activation functions not learn XOR?
XOR is not a straight-line relationship — without nonlinearity the network can only learn linear patterns
What is the sigmoid function?
σ(x) = 1 / (1 + e^−x) — squishes any number into the range (0, 1)
What does sigmoid output for a very large positive number?
Close to 1
What does sigmoid output for a very large negative number?
Close to 0
What is ReLU?
Rectified Linear Unit — ReLU(a) = max(0, a). Returns 0 for negative inputs, returns the input itself for positive inputs
Why did ReLU replace sigmoid in most modern networks?
Sigmoid squashes gradients toward zero at the extremes which kills learning in deep networks — ReLU does not have that problem
What is the cost function?
A function that measures how wrong the network is by summing the squared differences between the network's outputs and the correct answers
Why do we square the differences in the cost function?
To make all errors positive and to punish large errors harder than small ones
Why not use absolute value instead of squaring in the cost function?
Absolute value has a corner at zero that makes the gradient math messier
What does a small cost value mean?
The network's outputs were close to the correct answers
What does a large cost value mean?
The network's outputs were far from the correct answers
What does the average cost across all training examples tell you?
How good or bad the network is overall
What is gradient descent?
The process of repeatedly nudging weights and biases by some multiple of the negative gradient to minimize the cost function
What is backpropagation?
The algorithm for computing the gradient of the cost function efficiently across all weights and biases
Why are weights and biases initialized randomly?
There is no correct starting point — random initialization lets gradient descent find its way from wherever it starts
What is a local minimum in the context of gradient descent?
A point where the cost is lower than nearby points but not necessarily the lowest possible cost overall
Why are step sizes made proportional to the slope in gradient descent?
To avoid overshooting the minimum — a steep slope means you are far from the bottom so larger steps are safe, a shallow slope means you are close so smaller steps are needed