Neural Networks

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/15

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

16 Terms

1
New cards

Perceptron: Core Function

The Perceptron is a linear binary classifier. Its prediction for input x is: ŷ = sign(wᵀx + b), where w is the weight vector, b is the bias, and the sign function outputs +1 if the argument > 0, else -1. It learns by adjusting w and b when it misclassifies a training example (x, y).

2
New cards

Perceptron Learning Rule (Weight Update)

When the perceptron makes a mistake (ŷ ≠ y), it updates: w ← w + η y x and b ← b + η y. η is the learning rate. This update moves the decision boundary to make the current example x more likely to be correctly classified as y.

3
New cards

Perceptron Limitation: Linear Separability

A fundamental limitation: The perceptron can only learn patterns that are linearly separable. This means there must exist a single straight line (or hyperplane) that perfectly separates the two classes in the input space. It fails on non-linear problems like XOR.

4
New cards

Dealing with Non-Linear Data: Feature Transformation

To handle non-linear data with linear models, we apply non-linear transformations to the input features. Example: Transforming (x, y) coordinates to include x², y², xy can make a circular boundary become linear in the new, higher-dimensional space. This "warps" the space so a linear classifier can work.

5
New cards

Biological Neuron vs. Artificial Neuron

Biological Neuron: Dendrites receive signals, cell body sums them, axon fires (all-or-nothing) if threshold exceeded. Artificial Neuron models this as: output = f(b + Σ xi wi), where the activation function f (e.g., sigmoid, ReLU) introduces non-linearity, mimicking the firing threshold and signal transformation.

6
New cards

Activation Functions: Purpose

Activation Functions (e.g., Sigmoid, Tanh, ReLU) introduce non-linearity into neural networks. Without them, multiple layers would collapse into a single linear transformation, losing the ability to approximate complex, non-linear functions. They determine a neuron's output given its weighted input sum.

7
New cards

ReLU (Rectified Linear Unit) Function

ReLU is defined as f(z) = max(0, z). It outputs 0 for negative inputs and passes positive inputs unchanged. Key advantages: Computationally simple (no exponentials), mitigates the vanishing gradient problem for positive inputs, and promotes sparse activations (many zero outputs).

8
New cards

Multi-Layer Perceptron (MLP) Structure

An MLP consists of: 1) Input Layer (passes data). 2) One or more Hidden Layers, each containing artificial neurons that apply a weighted sum and activation function. 3) Output Layer that produces final predictions (e.g., via softmax for classification). Data flows forward from input to output.

9
New cards

Why Multiple Layers (Deep Networks)

Multiple layers enable hierarchical feature learning. Early layers learn simple features (e.g., edges). Subsequent layers combine these into more complex, abstract representations (e.g., shapes, objects). Stacking non-linear transformations allows the network to model highly complex, non-linear decision boundaries.

10
New cards

Forward Pass in an MLP

The Forward Pass computes predictions: For layer l, h⁽ˡ⁾ = f(W⁽ˡ⁾h⁽ˡ⁻¹⁾ + b⁽ˡ⁾), where h⁽ˡ⁾ is the layer's activation vector, W⁽ˡ⁾ is its weight matrix, b⁽ˡ⁾ is its bias vector, and f is the activation function (e.g., ReLU). The input is h⁽⁰⁾. The final output ŷ is produced by the output layer (e.g., softmax).

11
New cards

Loss Function (Cross-Entropy for Classification)

For classification, a common loss is Cross-Entropy Loss: L = - Σi yi log(ŷ_i), where y is the true one-hot label vector and ŷ is the predicted probability vector (from softmax). It measures the dissimilarity between the true and predicted distributions; lower loss indicates better predictions.

12
New cards

Backpropagation: Core Idea

Backpropagation is the algorithm for efficiently computing the gradient of the loss function with respect to every weight and bias in the network. It works by applying the chain rule of calculus backwards from the output layer to the input layer, propagating error gradients layer by layer.

13
New cards

Gradient Descent Weight Update

After backpropagation computes the gradient ∂L/∂w for a parameter, Gradient Descent updates it: w ← w - η (∂L/∂w). The learning rate η controls the step size. The negative sign moves the parameter in the direction that decreases the loss. This is done for all parameters (weights and biases).

14
New cards

Convolutional Neural Networks (CNNs): Purpose

CNNs are specialized for grid-like data (images, audio, time-series). They use convolutional layers with learnable filters that scan the input to detect local patterns (e.g., edges, textures). Key features: parameter sharing (same filter across locations) and spatial hierarchy (simple to complex features).

15
New cards

Recurrent Neural Networks (RNNs): Purpose

RNNs are designed for sequential data (text, speech, time series). They have a hidden state that acts as a "memory" of previous inputs in the sequence. At each time step, the RNN combines the current input and the previous hidden state to produce an output and update the hidden state, allowing it to model temporal dependencies.

16
New cards

Training Initialization: Random Weights

Weights and biases are initialized with small random values (e.g., from a Gaussian distribution). This is crucial because initializing all parameters to the same value (like zero) would cause all neurons in a layer to learn identical features during training, breaking symmetry and preventing learning.