CSCI 4521 Machine Learning Quiz 4

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/55

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:46 PM on 4/9/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

56 Terms

1
New cards

Sigmoid Function

f(x) = 1 / (1 + e^(-x)). Maps any real number to a value between 0 and 1 (S-shaped curve). Used in logistic regression for binary classification.

2
New cards

Softmax Function

Exponentiates each logit and normalizes so all outputs sum to 1.0, producing a valid probability distribution over classes. Formula: softmax(z_i) = e^(z_i) / Σe^(z_j)

3
New cards

Relationship between logs and exponents

ln(e^x) = x and e^(ln(x)) = x. Logs convert multiplication to addition, avoiding underflow with small probabilities.

4
New cards

Key ln values to memorize

ln(1) = 0, ln(2) ≈ 0.693, ln(5) ≈ 1.609, e^0 = 1

5
New cards

MNIST dataset size

60,000 training images and 10,000 test images. Each image is a 28×28 grayscale pixel grid.

6
New cards

PyTorch training loop (4 steps in order)

1) optimizer.zero_grad() — clear old gradients. 2) prediction = model(x) — forward pass. 3) loss.backward() — compute gradients. 4) optimizer.step() — update parameters.

7
New cards

optimizer.zero_grad()

Clears old gradients so they don't accumulate across batches. Missing this causes loss to explode/diverge.

8
New cards

loss.backward()

Computes the gradients of the loss with respect to all model parameters (backpropagation).

9
New cards

optimizer.step()

Updates model parameters using the gradients computed by loss.backward().

10
New cards

torch.utils.data.Dataset

Base class for custom datasets. Must implement init (setup data), len (number of entries), and getitem (return feature and label at index).

11
New cards

nn.Module

Base class for all PyTorch neural networks. Tracks weights automatically. Requires init() to set up layers and forward() to process input and return prediction.

12
New cards

nn.Linear

Implements matrix multiplication for building linear layers. E.g., nn.Linear(784, 10) maps a flattened 28×28 image to 10 output classes.

13
New cards

torch.optim.SGD

Standard stochastic gradient descent optimizer.

14
New cards

torch.optim.Adam

Optimizer that combines momentum with adaptive per-parameter learning rates. Generally better than vanilla SGD.

15
New cards

Logits vs. Probabilities

Models output raw scores called logits. Sigmoid (binary) or Softmax (multi-class) converts these to probabilities.

16
New cards

One-Hot Encoding

Represents classes as binary vectors (e.g., class 2 of 4 = [0,1,0,0]). Prevents the model from assuming numeric ordering between categories. Works naturally with Softmax and Cross-Entropy Loss.

17
New cards

Negative Log Likelihood (NLL)

NLL = -ln(p_true). Measures how well predicted probability matches the true label. Produces very large gradients when p_true is near 0, driving faster correction. Minimizing NLL = maximizing likelihood.

18
New cards

nn.CrossEntropyLoss

Combines Softmax + NLL into one PyTorch function. Takes raw logits as input — do NOT apply softmax yourself first.

19
New cards

nn.BCELoss

Binary Cross Entropy Loss. Used for multi-label classification where items can belong to multiple classes simultaneously.

20
New cards

Epoch

One full pass over the entire training dataset.

21
New cards

Batch

A subset of the dataset processed simultaneously in one forward/backward pass.

22
New cards

Mini-Batch SGD update formula

Total parameter updates = (dataset_size / batch_size) × num_epochs

23
New cards

Full-Batch Gradient Descent

When batch size equals the entire dataset, the model uses all data for each update. This is full-batch gradient descent.

24
New cards

Momentum

Adds a fraction of the previous update step to the current gradient. Accelerates convergence, reduces oscillation/noise, and helps escape small local minima.

25
New cards

Adam Optimizer

Improves standard SGD by combining momentum with adaptive per-parameter learning rates.

26
New cards

Backtracking Line Search

Starts with a large step size and iteratively shrinks it until a convergence criterion is met, maximizing convergence rate.

27
New cards

Armijo Condition (1st Wolfe Condition)

Ensures the chosen step size provides "sufficient decrease" in the objective function.

28
New cards

Logistic Regression

Uses sigmoid to map logits to probabilities for binary classification. Keeps predictions bounded between 0 and 1. SKLearn: SGDClassifier(loss='log_loss')

29
New cards

Support Vector Machine (SVM / SVC)

Finds the optimal hyperplane that maximizes the margin of separation between two classes. Can use the kernel trick for non-linear data. SKLearn: SVC(kernel='linear', probability=True)

30
New cards

Decision Trees

Non-linear models that split data based on sequential thresholds. Intuitive but prone to overfitting without pruning or depth limits. SKLearn: DecisionTreeClassifier(max_depth=10)

31
New cards

Random Forests

Ensemble method that builds many decision trees independently and in parallel (bagging), then combines predictions via majority voting. Reduces variance and overfitting.

32
New cards

Boosting (e.g., XGBoost)

Ensemble method that builds trees sequentially, where each new tree focuses on correcting errors made by previous trees. Extremely accurate but sensitive to outliers.

33
New cards

Random Forests vs. Boosting

Random Forests: trees built independently in parallel. Boosting: trees built sequentially, each correcting previous errors.

34
New cards

GMMs as Generative Models

Can synthesize new images/data by sampling from a learned probability distribution of pixels.

35
New cards

PCA before GMMs

PCA(0.99, whiten=True) keeps enough components for 99% of variance and normalizes variance in all directions. Reduces dimensionality before fitting GMM.

36
New cards

EM Algorithm — E-step (Expectation)

Compute the probability that each data point belongs to each Gaussian component (soft assignments).

37
New cards

EM Algorithm — M-step (Maximization)

Update Gaussian parameters (means, covariances) using the soft assignments from the E-step.

38
New cards

Image Quantization / Posterization with GMM

Fit GMM (e.g., K=8) to all pixels, replace each pixel with its cluster mean. Reduces number of colors = image compression. The artistic effect is called posterization.

39
New cards

ROC Curve

Plots True Positive Rate vs. False Positive Rate. Can be misleading on imbalanced datasets.

40
New cards

Precision-Recall (PR) Curve

Plots Precision vs. Recall. More sensitive and informative than ROC on imbalanced datasets.

41
New cards

ROC vs PR on imbalanced data

ROC can be misleading because high true negatives hide low precision. PR curves are preferred for imbalanced datasets.

42
New cards

Confusion Matrix

Visual grid showing which classes are predicted correctly and which are confused for one another.

43
New cards

cross_validate()

SKLearn function that evaluates multiple scoring metrics at once, e.g., scoring=['accuracy', 'roc_auc', 'f1'].

44
New cards

L1/L2 Regularization

Directly penalize large weight values inside the loss function during training.

45
New cards

AIC / BIC

Measure goodness-of-fit with a built-in penalty for model complexity (number of parameters). Lower scores = better. Evaluate models after training, unlike L1/L2 which constrain during training.

46
New cards

Oversampling

Duplicates or generates synthetic minority class examples (e.g., SMOTE). Risk: overfitting if same examples repeated too often; uses more memory; slower training.

47
New cards

Undersampling

Randomly deletes samples from the majority class. Trains faster but loses information.

48
New cards

Argmax for predictions

torch.argmax(logits) returns the index of the largest value = predicted class. E.g., [-1.2, 0.5, 4.2, 2.1] → predicted class = 2.

49
New cards

Abstract Expressionism

Spontaneous, intuitive creation using bold brushstrokes and large canvases (Pollock, Rothko). Has a historical connection to the CIA.

50
New cards

Image pixel structure

Images are structured grids of pixel intensity values. Color = RGB (3 channels), grayscale = 1 channel.

51
New cards

Uncompressed image formats

.bmp, .ppm — store every pixel; large file sizes.

52
New cards

Lossless compression formats

.png, .tiff, .gif — compressed using Huffman encoding but perfectly reconstructible.

53
New cards

Lossy compression formats

.jpg — smaller files but some quality is permanently lost via interpolation.

54
New cards

Softmax+NLL example: z=[ln2,ln3,ln5], true=C

e^z = (2,3,5), sum=10. Probs: A=0.2, B=0.3, C=0.5. NLL = -ln(0.5) = ln(2) ≈ 0.693

55
New cards

Softmax+NLL example: z=[ln2,0,ln7], true=A

e^z = (2,1,7), sum=10. Probs: A=0.2, B=0.1, C=0.7. NLL = -ln(0.2) = ln(5) ≈ 1.609

56
New cards

SGD calc: 5000 images, batch=100, 10 epochs

Batches/epoch = 50. Total updates = 50 × 10 = 500.