2 - Binary Classification

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/38

There's no tags or description

Looks like no tags are added yet.

Last updated 6:38 PM on 5/18/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

39 Terms

New cards

SGD Advantages

Uses less memory than full‑batch GD
Faster updates
Helps escape local minima
Works well for large datasets

New cards

SGD Disadvantages

Noisy updates
Loss curve fluctuates
Requires tuning learning rate

New cards

SGD Assumptions

Loss function is differentiable
Mini‑batches are representative of the dataset

New cards

Mini-batch GD advantages

More stable than pure SGD
Faster than full‑batch GD
Efficient on GPUs

New cards

Mini-batch GD disadvantages

Still noisy
Batch size must be chosen carefully

New cards

Mini-batch GD assumptions

Data is shuffled
Batches are independent

New cards

Dataloader advantages

Automatically batches data
Shuffles data
Handles parallel loading
Makes SGD efficient

New cards

Dataloader disadvantages

Requires correct transforms
Can bottleneck if num_workers is too low

New cards

Dataloader assumptions

Dataset implements getitem and len

New cards

Logistic Regression advantages

Simple
Fast
Outputs probabilities
Works well for linearly separable data

New cards

Logistic Regression disadvantages

Only models linear boundaries
Struggles with complex patterns
Sensitive to outliers

New cards

Logistic Regression assumptions

Classes are linearly separable
Input features are meaningful
Uses sigmoid → outputs in (0,1)

New cards

Sigmoid Activation advantages

Smooth
Outputs probabilities
Differentiable

New cards

Sigmoid Activation disadvantages

Saturates (vanishing gradients)
Not ideal for deep networks

New cards

Sigmoid Activation assumptions

Output should be a probability

New cards

BCE Advantages

Proper loss for binary classification
Strong gradient signal
Works well with sigmoid

New cards

BCE Disadvantages

Sensitive to extreme predictions
Requires correct label encoding (0/1)

New cards

BCE Assumptions

Labels are 0 or 1
Output is a probability

New cards

What is Stochastic Gradient Descent?

An optimisation method that updates parameters using the gradient from a single sample or mini‑batch

New cards

Why use SGD?

It saves memory, is faster, and helps escape local minima

New cards

What is mini-batch GD?

Computing gradients on a small random subset of the data

New cards

What does DataLoader do?

Creates batches, shuffles data, and loads it efficiently

New cards

What does shuffle=True mean?

Data is randomly shuffled each epoch

New cards

What does drop_last=True do?

Drops the last incomplete batch

New cards

What is the shape of a batch?

imgs: (batch_size, 3, 32, 32)

labels: (batch_size,)

New cards

Why remap CIFAR‑10 to 0/1?

To convert a 10‑class problem into a binary problem

New cards

Why not use label=0 and label=1 directly?

Because CIFAR‑10 labels are integers 0–9, not booleans

New cards

What is logistic regression?

A linear model whose output is passed through a sigmoid to produce a probability. Formula where w and b are learnable parameters:

New cards

What does sigmoid do?

Maps any real number to a value between 0 and 1

New cards

Why use sigmoid?

To interpret output as a probability

New cards

Why flatten images?

Logistic regression expects a vector, not a 3D image

New cards

What does BCE measure?

The difference between predicted probability and true label.

New cards

Why use BCE?

It is the correct loss for binary classification

New cards

What are the steps of a training loop?

1. Forward pass

2. Compute loss

3. Backward pass

4. Update parameters

Zero gradients

New cards

Why use nn.Module?

To structure models cleanly and use built‑in optimizers

New cards

What does forward() do?

Defines how the model computes outputs

New cards

What is nn.Parameter?

A tensor that PyTorch treats as a learnable parameter

New cards

Why threshold at 0.5?

Sigmoid outputs probabilities; ≥0.5 means class 1

New cards

How is accuracy computed?

Correct predictions / total samples