Model Overfitting, Model Selection and Nearest Neighbor Classification

0.0(0)
Studied by 1 person
call kaiCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/47

encourage image

There's no tags or description

Looks like no tags are added yet.

Last updated 1:01 AM on 3/23/26
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No analytics yet

Send a link to your students to track their progress

48 Terms

1
New cards

(T/F) Model fitting shows poor generalization performance

True

2
New cards

Steady ________ but the __________ increases as the tree size increases

training error, testing error

3
New cards

Even if we try to show the lowest error on the training set, it memorizes the _______

noise or outliers

4
New cards

What causes the training error to keep steady while the testing error increases with the tree size?

The limited training size and high model complexity

5
New cards

Pruning provides _________ and __________ tree depth

early stopping(it stops expanding the tree), limits

6
New cards

What is the goal with the model selection use of a validation set?

Estimated a models generalization error using “out-of-sample” data

7
New cards

A better indication of real-world performances carefully balancing the data split to ensure both ___________ and ___________

robust model training, reliable error evaluation

8
New cards

Evaluates the model on a ____________ validation set that is ____________ from the training process

seperate, excluded

9
New cards

What are the pruning stopping conditions (Pre-pruning)?

  • stop if all instances belong to the same classes

  • stop using if all attribute values are the same

  • stop if the number of instances are less than some user specified threshold

  • stop if the class distribution of instances are independent of the available features (e.g. GINI or INFORMATION GAIN)

  • stop if the estimated generalization error falls below a certain threshold

10
New cards

Name a post pruning procedure

Subtree replacement

11
New cards

With subtree replacement, you trim the nodes of a decision tree in a _______________. If _____________ error improves after trimming, replace _________ by a __________ .

bottom up fashion, generalization, sub-tree, leaf node

12
New cards

The class label of a leaf node is determined from the majority class of instances in the ______________

sub-tree

13
New cards

Pros of a decision tree

  • versatile

  • extremely fast at classifying unknown records

  • relatively inexpensive to construct

  • robust to noise (especially when methods to avoid overfitting are employed)

  • can easily handle redundant attributes

  • can easily handle irrelevant attributes

14
New cards

Cons of a decision tree

  • interacting attributes: attributes that are able to distinguish between classes when used together

    • But individually, they provide little to no information

  • due to the greedy nature of the splitting criteria in decision trees, such attributes could be passed over in favor of other attributes that are not as useful

  • Large Decision Tees are hard to intepret

  • Tree pruning is needed to tackle overfitting

15
New cards

Occam’s Razor

given two models of similar generalization errors, one should prefer the simpler model over the more complex model

16
New cards

A complex model has a greater chance of…

being fitted accidentally

17
New cards

Model Evaluation

estimates the performance of classifiers on previously unseen data

18
New cards

Hold-out

reserving k% for training and 100 - k% for testing

19
New cards

Cross Validation (K-Fold)

is a type of repeated hold out that partitions the data into k disjoint subsets, training with k - 1 partitions and testing with the remaining one

20
New cards

Model fitting

A model memorizes training data but performs poorly on unseen data

21
New cards

Class Imbalance

Lots of classification problems where the classes are skewed (more records from one class than another) causing models to bias towards the majority

22
New cards

Evaluation measures such as accuracy are not well suited for …

imbalanced class

23
New cards

______________ can fail to detect more trivial models and rare classes that can be more interesting

  • frauds

  • intrustions

  • defects

class imbalance

24
New cards

Oversampling

replicating instances from minority labels

25
New cards

Downsampling

is when the frequency of the majority class is reduced to match the frequency of the minority class

26
New cards

Oversampling and Downsampling does NOT…

reflect the real distribution of data and may lead to poor generalization

27
New cards

Percision

the fraction of positive examples predicted correctly by the model from all positive predictions

28
New cards

True Positive Rate (sensitivity)

the faction of positive examples predicted correctly by the model from all the positive examples

29
New cards

True Negative Rate (specificity)

the fraction of negative examples predicted correctly by the model

30
New cards
<p>ROC (Receiver Operating Characteristics)</p>

ROC (Receiver Operating Characteristics)

is a graphical approach for displaying the trade-off between detection rate and false alarm rate plotting TPR against FPR

31
New cards

To draw a ROC curve, classifier must produce ____________ output

continuous-valued

32
New cards

Nearest Neighbor classification is mainly used when all attribute values are _______________ although they can be modified to deal with _______________

continuous, categorical attributes

33
New cards

Nearest Neighbor Classification

estimates the classification of an unseen instance using the classification of the instance or instances that are the closest to it (lazy learner)

34
New cards

Pros of KNN

  • simple and intuitive

  • no training phase

  • versatile

  • adaptable to multi-class problems

35
New cards

Cons of KNN

  • computationally expensive

  • sensitive to feature scaling

  • choice of K and distance metric

  • K-NN can struggle with imbalanced datasets

36
New cards

If k is too small, it can be …

sensitive to noise points

37
New cards

If k is too large…

the neighborhood may include points from other classes

38
New cards

A major problem when using the Euclidean distance formula (and many other distance measures) is that the __________ frequently swamp the ____________

large values, smaller one

39
New cards

What proximity measure is the best for documents?

co-sine similarity

40
New cards

Class weighting is crucial in critical systems like:

spam filtering and cancer diagnosis

41
New cards

A nearest neighbor classifier represents each example as a _________ in a d-dimensional space where d is the number of attributes

data point

42
New cards

Given a test instance, we compute its proximity to the _____________ according to one of the proximity measures

training instances

43
New cards

Find the k training instances that are ________ to the unseen instance. Take the ___________ classification for these k instances.

closest, most commonly occurring

44
New cards

Outputs are used to ____ test records, from the most likely positive class record to the least likely positive class record

rank

45
New cards

By using different thresholds on this value, we can create _________________ of the classifier with TPR/FPR tradeoffs

different variations

46
New cards

Many classifiers produce only ___________________

discrete outputs (i.e., predicted class)

47
New cards

How do you construct an ROC curve?

  • Use a classifier that produces a continuous-valued score for

each instance

• The more likely it is for the instance to be in the + class, the

higher the score

• Sort the instances in decreasing order according to the score

• Apply a threshold at each unique value of the score

• Count the number of TP, FP, TN, FN at each threshold

48
New cards
<p>No model consistently ___________ the other</p>

No model consistently ___________ the other

outperforms

Explore top notes

note
Chapter 18 - The French Revolution
Updated 1409d ago
0.0(0)
note
Changing State
Updated 1185d ago
0.0(0)
note
Chapter 29: Waste Water Treatment
Updated 1070d ago
0.0(0)
note
AFPF casus 6
Updated 435d ago
0.0(0)
note
5.1: The Progressive Movement
Updated 1268d ago
0.0(0)
note
Chapter 18 - The French Revolution
Updated 1409d ago
0.0(0)
note
Changing State
Updated 1185d ago
0.0(0)
note
Chapter 29: Waste Water Treatment
Updated 1070d ago
0.0(0)
note
AFPF casus 6
Updated 435d ago
0.0(0)
note
5.1: The Progressive Movement
Updated 1268d ago
0.0(0)

Explore top flashcards

flashcards
Unité 6 Entrée
24
Updated 848d ago
0.0(0)
flashcards
word check
103
Updated 1196d ago
0.0(0)
flashcards
Week 5 & 6
61
Updated 1210d ago
0.0(0)
flashcards
NCCT Medical Terminology
300
Updated 500d ago
0.0(0)
flashcards
psych exam 1
85
Updated 190d ago
0.0(0)
flashcards
Unit 1 Exam
97
Updated 1144d ago
0.0(0)
flashcards
Unité 6 Entrée
24
Updated 848d ago
0.0(0)
flashcards
word check
103
Updated 1196d ago
0.0(0)
flashcards
Week 5 & 6
61
Updated 1210d ago
0.0(0)
flashcards
NCCT Medical Terminology
300
Updated 500d ago
0.0(0)
flashcards
psych exam 1
85
Updated 190d ago
0.0(0)
flashcards
Unit 1 Exam
97
Updated 1144d ago
0.0(0)