2 - Binary Classification

0.0(0)
Studied by 0 people
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/38

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 6:38 PM on 5/18/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

39 Terms

1
New cards

SGD Advantages

  • Uses less memory than full‑batch GD

  • Faster updates

  • Helps escape local minima

  • Works well for large datasets

2
New cards

SGD Disadvantages

  • Noisy updates

  • Loss curve fluctuates

  • Requires tuning learning rate

3
New cards

SGD Assumptions

  • Loss function is differentiable

  • Mini‑batches are representative of the dataset

4
New cards

Mini-batch GD advantages

  • More stable than pure SGD

  • Faster than full‑batch GD

  • Efficient on GPUs

5
New cards

Mini-batch GD disadvantages

  • Still noisy

  • Batch size must be chosen carefully

6
New cards

Mini-batch GD assumptions

  • Data is shuffled

  • Batches are independent

7
New cards

Dataloader advantages

  • Automatically batches data

  • Shuffles data

  • Handles parallel loading

  • Makes SGD efficient

8
New cards

Dataloader disadvantages

  • Requires correct transforms

  • Can bottleneck if num_workers is too low

9
New cards

Dataloader assumptions

Dataset implements getitem and len

10
New cards

Logistic Regression advantages

  • Simple

  • Fast

  • Outputs probabilities

  • Works well for linearly separable data

11
New cards

Logistic Regression disadvantages

  • Only models linear boundaries

  • Struggles with complex patterns

  • Sensitive to outliers

12
New cards

Logistic Regression assumptions

  • Classes are linearly separable

  • Input features are meaningful

  • Uses sigmoid → outputs in (0,1)

13
New cards

Sigmoid Activation advantages

  • Smooth

  • Outputs probabilities

  • Differentiable

14
New cards

Sigmoid Activation disadvantages

  • Saturates (vanishing gradients)

  • Not ideal for deep networks

15
New cards

Sigmoid Activation assumptions

Output should be a probability

16
New cards

BCE Advantages

  • Proper loss for binary classification

  • Strong gradient signal

  • Works well with sigmoid

17
New cards

BCE Disadvantages

  • Sensitive to extreme predictions

  • Requires correct label encoding (0/1)

18
New cards

BCE Assumptions

  • Labels are 0 or 1

  • Output is a probability

19
New cards

What is Stochastic Gradient Descent?

An optimisation method that updates parameters using the gradient from a single sample or mini‑batch

20
New cards

Why use SGD?

It saves memory, is faster, and helps escape local minima

21
New cards

What is mini-batch GD?

Computing gradients on a small random subset of the data

22
New cards

What does DataLoader do?

Creates batches, shuffles data, and loads it efficiently

23
New cards

What does shuffle=True mean?

Data is randomly shuffled each epoch

24
New cards

What does drop_last=True do?

Drops the last incomplete batch

25
New cards

What is the shape of a batch?

imgs: (batch_size, 3, 32, 32)

labels: (batch_size,)

26
New cards

Why remap CIFAR‑10 to 0/1?

To convert a 10‑class problem into a binary problem

27
New cards

Why not use label=0 and label=1 directly?

Because CIFAR‑10 labels are integers 0–9, not booleans

28
New cards

What is logistic regression?

A linear model whose output is passed through a sigmoid to produce a probability. Formula where w and b are learnable parameters:

29
New cards

What does sigmoid do?

Maps any real number to a value between 0 and 1

30
New cards

Why use sigmoid?

To interpret output as a probability

31
New cards

Why flatten images?

Logistic regression expects a vector, not a 3D image

32
New cards

What does BCE measure?

The difference between predicted probability and true label.

33
New cards

Why use BCE?

It is the correct loss for binary classification

34
New cards

What are the steps of a training loop?

1.        Forward pass

2.        Compute loss

3.        Backward pass

4.        Update parameters

  1. Zero gradients

35
New cards

Why use nn.Module?

To structure models cleanly and use built‑in optimizers

36
New cards

What does forward() do?

Defines how the model computes outputs

37
New cards

What is nn.Parameter?

A tensor that PyTorch treats as a learnable parameter

38
New cards

Why threshold at 0.5?

Sigmoid outputs probabilities; ≥0.5 means class 1

39
New cards

How is accuracy computed?

Correct predictions / total samples