Neural Networks chap 1 & 2

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/30

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 7:42 PM on 4/30/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

31 Terms

1
New cards

What is a multilayer perceptron?

A "plain vanilla" neural network — the foundational type of network before specializations like CNNs or LSTMs

2
New cards

What is a Convolutional Neural Network good for?

Image recognition

3
New cards

What is an Long Short-Term Memory good for?

Speech recognition

4
New cards

what is a neuron?

A function that takes inputs, computes a weighted sum plus bias, passes it through an activation function, and outputs a number

5
New cards

What are hidden layers?

All layers between the input and output layer

6
New cards

Why are layers useful?

They break the problem into smaller more manageable pieces

7
New cards

What is a weight?

A value representing the strength of the connection between two neurons

8
New cards

What is a bias?

A value that shifts the threshold for when a neuron activates

9
New cards

What does a large negative bias mean for a neuron?

The neuron needs a strong input signal to fire — it is biased toward inactivity

10
New cards

What does a large positive bias mean for a neuron?

The neuron fires easily even with weak inputs

11
New cards

What does a neuron compute before the activation function?

(input1 × weight1) + (input2 × weight2) + … + bias

12
New cards

What is learning in a neural network?

Finding the right weights and biases

13
New cards

What does an activation function do?

Squishes the raw weighted sum into a usable range and adds nonlinearity

14
New cards

What happens if you stack layers with no activation function?

Any chain of linear operations collapses into a single linear operation — the network behaves like one layer and can only learn straight-line relationships

15
New cards

Why can a network without activation functions not learn XOR?

XOR is not a straight-line relationship — without nonlinearity the network can only learn linear patterns

16
New cards

What is the sigmoid function?

σ(x) = 1 / (1 + e^−x) — squishes any number into the range (0, 1)

17
New cards

What does sigmoid output for a very large positive number?

Close to 1

18
New cards

What does sigmoid output for a very large negative number?

Close to 0

19
New cards

What is ReLU?

Rectified Linear Unit — ReLU(a) = max(0, a). Returns 0 for negative inputs, returns the input itself for positive inputs

20
New cards

Why did ReLU replace sigmoid in most modern networks?

Sigmoid squashes gradients toward zero at the extremes which kills learning in deep networks — ReLU does not have that problem

21
New cards

What is the cost function?

A function that measures how wrong the network is by summing the squared differences between the network's outputs and the correct answers

22
New cards

Why do we square the differences in the cost function?

To make all errors positive and to punish large errors harder than small ones

23
New cards

Why not use absolute value instead of squaring in the cost function?

Absolute value has a corner at zero that makes the gradient math messier

24
New cards

What does a small cost value mean?

The network's outputs were close to the correct answers

25
New cards

What does a large cost value mean?

The network's outputs were far from the correct answers

26
New cards

What does the average cost across all training examples tell you?

How good or bad the network is overall

27
New cards

What is gradient descent?

The process of repeatedly nudging weights and biases by some multiple of the negative gradient to minimize the cost function

28
New cards

What is backpropagation?

The algorithm for computing the gradient of the cost function efficiently across all weights and biases

29
New cards

Why are weights and biases initialized randomly?

There is no correct starting point — random initialization lets gradient descent find its way from wherever it starts

30
New cards

What is a local minimum in the context of gradient descent?

A point where the cost is lower than nearby points but not necessarily the lowest possible cost overall

31
New cards

Why are step sizes made proportional to the slope in gradient descent?

To avoid overshooting the minimum — a steep slope means you are far from the bottom so larger steps are safe, a shallow slope means you are close so smaller steps are needed