1/76
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
What is clustering?
Unsupervised grouping of data points based on similarity.
Why is clustering unsupervised?
Labels are unknown; structure must be discovered.
Key challenge in clustering
Clusters may not match meaningful categories.
Visual cluster identification
"Look for dense groups, separation gaps, shape."
Intuition behind k-means
Use centroids and iteratively improve assignments.
Why use centroids?
Centroids summarize the center of a group.
Four steps of k-means
Initialize → Assign → Update → Repeat.
Distance effect in k-means
Points join nearest centroid; boundaries depend on distance metric.
Manhattan vs Euclidean distance
Manhattan gives diamond shapes; Euclidean gives circular.
Choosing k—Elbow
Pick k where SSE reduction slows.
Choosing k—Silhouette
To choose k, compute the average silhouette score for different values of k and pick the k with the highest average score.
Choosing k—Domain knowledge
Choose based on real-world expectations.
Poor k-means cases
"Non-spherical clusters, varying sizes, outliers."
Maximal margin classifier
Hyperplane maximizing margin to nearest points.
Support vectors
Points that define the margin and boundary.
Meaning of margin
Distance from boundary to closest points.
Why maximize margin?
"Better generalization, less overfitting."
Hard-margin SVM objective
Minimize ||w||² with perfect separation constraints.
Purpose of constraints
Enforce correct classification with separation.
Soft-margin SVM
Allows violations with slack variables.
When soft-margin needed
Noisy or overlapping data.
Kernel trick
Implicit high-dimensional mapping via kernels.
Why kernels help
Enable nonlinear boundaries with linear models.
Kernel comparison
"Linear=simple, Polynomial=interactions, RBF=complex local."
Dual vs primal
Dual supports kernelization.
SVM decision function
"Sign(sum α_i y_i K(x_i, x) + b)."
Neural network definition
Layered function approximator with neurons.
NN structure
"Neurons, layers, weights."
NN data flow
Input → weighted sum → activation → output.
Forward propagation
Computing predictions layer by layer.
Loss function purpose
Quantify prediction error.
Purpose of hidden layers
Model nonlinear patterns.
What are weights?
Trainable parameters shaping the model.
Activation function
Nonlinear mapping enabling complexity.
Common activations
"ReLU, Sigmoid."
Input neurons with 3 features
Three neurons required.
What is training?
Forward + backward passes updating weights.
Why deeper networks help
More hierarchical representations.
Chain rule purpose
Differentiate composite functions.
Outer vs inner function
Outer wraps inner in composite expressions.
First chain rule step
Differentiate outer at inner.
Chain rule in NN
Propagates dependency through layers.
Backpropagation meaning
Gradient computation backward through layers.
Backward pass
Compute gradients for all weights.
How weights know what to change
Gradients show contribution to error.
Chain rule importance
Essential for deep gradient flow.
Updating weight steps
Compute error → backprop → gradient → update.
Convolution definition
Sliding filter computing weighted sums.
How CNNs learn
"Filters adapt to edges, textures, shapes."
Main CNN layers
"Conv, ReLU, Pooling, Fully-connected."
Purpose of max pooling
Reduce size while keeping key activations.
Pooling effect
Reduces spatial dimensions.
How RNN works
Uses hidden state for sequence memory.
Hidden state importance
Captures temporal dependencies.
Vanishing/exploding gradients
Gradients shrink or blow up across time.
Why LSTM/GRU help
Gates control memory retention.
CNN vs RNN on sequences
CNN = local features; RNN = temporal flow.
Model strengths—CNN vs RNN
CNN for spatial; RNN for sequence.
Convolution output formula
((W−F+2P)/S) + 1.
AlexNet conv output
Apply conv formula for 227x227 input.
Number of conv units
Filters × output width × height.
Conv parameters with sharing
Filter size × filters + biases.
FF parameter explosion
Flattening produces huge parameter count.
Max pooling effect
Downsamples by choosing max value.
Kernel feature types
"Edges, diagonals, sharpening."
Gradient flow effect
Gradients reach all layers.
Backprop layer relation
dL/dW = error × input activation.
Why backward pass needed
Forward predicts; backward updates.
Full CNN architecture example
Conv → Conv → Pool → Dense → Dense → Output.
Why stack conv layers
Deeper abstractions.
Purpose of downsampling
Reduce compute and focus on patterns.
Most fundamental NN idea
Composition of linear + nonlinear functions.
Essence of backprop
Backward gradient flow updates weights.
CNN vs FF
CNN uses shared filters; FF uses all connections.
SVM vs NN
Margin maximization vs error minimization.
Hard vs soft margin
Perfect separation vs tolerance.
Common k-means mistake
Using it on non-spherical clusters.