Neural Networks chap 1 & 2

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/30

There's no tags or description

Looks like no tags are added yet.

Last updated 7:42 PM on 4/30/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

31 Terms

New cards

What is a multilayer perceptron?

A "plain vanilla" neural network — the foundational type of network before specializations like CNNs or LSTMs

New cards

What is a Convolutional Neural Network good for?

Image recognition

New cards

What is an Long Short-Term Memory good for?

Speech recognition

New cards

what is a neuron?

A function that takes inputs, computes a weighted sum plus bias, passes it through an activation function, and outputs a number

New cards

What are hidden layers?

All layers between the input and output layer

New cards

Why are layers useful?

They break the problem into smaller more manageable pieces

New cards

What is a weight?

A value representing the strength of the connection between two neurons

New cards

What is a bias?

A value that shifts the threshold for when a neuron activates

New cards

What does a large negative bias mean for a neuron?

The neuron needs a strong input signal to fire — it is biased toward inactivity

New cards

What does a large positive bias mean for a neuron?

The neuron fires easily even with weak inputs

New cards

What does a neuron compute before the activation function?

(input1 × weight1) + (input2 × weight2) + … + bias

New cards

What is learning in a neural network?

Finding the right weights and biases

New cards

What does an activation function do?

Squishes the raw weighted sum into a usable range and adds nonlinearity

New cards

What happens if you stack layers with no activation function?

Any chain of linear operations collapses into a single linear operation — the network behaves like one layer and can only learn straight-line relationships

New cards

Why can a network without activation functions not learn XOR?

XOR is not a straight-line relationship — without nonlinearity the network can only learn linear patterns

New cards

What is the sigmoid function?

σ(x) = 1 / (1 + e^−x) — squishes any number into the range (0, 1)

New cards

What does sigmoid output for a very large positive number?

Close to 1

New cards

What does sigmoid output for a very large negative number?

Close to 0

New cards

What is ReLU?

Rectified Linear Unit — ReLU(a) = max(0, a). Returns 0 for negative inputs, returns the input itself for positive inputs

New cards

Why did ReLU replace sigmoid in most modern networks?

Sigmoid squashes gradients toward zero at the extremes which kills learning in deep networks — ReLU does not have that problem

New cards

What is the cost function?

A function that measures how wrong the network is by summing the squared differences between the network's outputs and the correct answers

New cards

Why do we square the differences in the cost function?

To make all errors positive and to punish large errors harder than small ones

New cards

Why not use absolute value instead of squaring in the cost function?

Absolute value has a corner at zero that makes the gradient math messier

New cards

What does a small cost value mean?

The network's outputs were close to the correct answers

New cards

What does a large cost value mean?

The network's outputs were far from the correct answers

New cards

What does the average cost across all training examples tell you?

How good or bad the network is overall

New cards

What is gradient descent?

The process of repeatedly nudging weights and biases by some multiple of the negative gradient to minimize the cost function

New cards

What is backpropagation?

The algorithm for computing the gradient of the cost function efficiently across all weights and biases

New cards

Why are weights and biases initialized randomly?

There is no correct starting point — random initialization lets gradient descent find its way from wherever it starts

New cards

What is a local minimum in the context of gradient descent?

A point where the cost is lower than nearby points but not necessarily the lowest possible cost overall

New cards

Why are step sizes made proportional to the slope in gradient descent?

To avoid overshooting the minimum — a steep slope means you are far from the bottom so larger steps are safe, a shallow slope means you are close so smaller steps are needed