Model Overfitting, Model Selection and Nearest Neighbor Classification

0.0(0)

Studied by 1 person

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/47

There's no tags or description

Looks like no tags are added yet.

Last updated 1:01 AM on 3/23/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

48 Terms

New cards

(T/F) Model fitting shows poor generalization performance

True

New cards

Steady ________ but the __________ increases as the tree size increases

training error, testing error

New cards

Even if we try to show the lowest error on the training set, it memorizes the _______

noise or outliers

New cards

What causes the training error to keep steady while the testing error increases with the tree size?

The limited training size and high model complexity

New cards

Pruning provides _________ and __________ tree depth

early stopping(it stops expanding the tree), limits

New cards

What is the goal with the model selection use of a validation set?

Estimated a models generalization error using “out-of-sample” data

New cards

A better indication of real-world performances carefully balancing the data split to ensure both ___________ and ___________

robust model training, reliable error evaluation

New cards

Evaluates the model on a ____________ validation set that is ____________ from the training process

seperate, excluded

New cards

What are the pruning stopping conditions (Pre-pruning)?

stop if all instances belong to the same classes
stop using if all attribute values are the same
stop if the number of instances are less than some user specified threshold
stop if the class distribution of instances are independent of the available features (e.g. GINI or INFORMATION GAIN)
stop if the estimated generalization error falls below a certain threshold

New cards

Name a post pruning procedure

Subtree replacement

New cards

With subtree replacement, you trim the nodes of a decision tree in a _______________. If _____________ error improves after trimming, replace _________ by a __________ .

bottom up fashion, generalization, sub-tree, leaf node

New cards

The class label of a leaf node is determined from the majority class of instances in the ______________

sub-tree

New cards

Pros of a decision tree

versatile
extremely fast at classifying unknown records
relatively inexpensive to construct
robust to noise (especially when methods to avoid overfitting are employed)
can easily handle redundant attributes
can easily handle irrelevant attributes

New cards

Cons of a decision tree

interacting attributes: attributes that are able to distinguish between classes when used together
- But individually, they provide little to no information
due to the greedy nature of the splitting criteria in decision trees, such attributes could be passed over in favor of other attributes that are not as useful
Large Decision Tees are hard to intepret
Tree pruning is needed to tackle overfitting

New cards

Occam’s Razor

given two models of similar generalization errors, one should prefer the simpler model over the more complex model

New cards

A complex model has a greater chance of…

being fitted accidentally

New cards

Model Evaluation

estimates the performance of classifiers on previously unseen data

New cards

Hold-out

reserving k% for training and 100 - k% for testing

New cards

Cross Validation (K-Fold)

is a type of repeated hold out that partitions the data into k disjoint subsets, training with k - 1 partitions and testing with the remaining one

New cards

Model fitting

A model memorizes training data but performs poorly on unseen data

New cards

Class Imbalance

Lots of classification problems where the classes are skewed (more records from one class than another) causing models to bias towards the majority

New cards

Evaluation measures such as accuracy are not well suited for …

imbalanced class

New cards

______________ can fail to detect more trivial models and rare classes that can be more interesting

frauds
intrustions
defects

class imbalance

New cards

Oversampling

replicating instances from minority labels

New cards

Downsampling

is when the frequency of the majority class is reduced to match the frequency of the minority class

New cards

Oversampling and Downsampling does NOT…

reflect the real distribution of data and may lead to poor generalization

New cards

Percision

the fraction of positive examples predicted correctly by the model from all positive predictions

New cards

True Positive Rate (sensitivity)

the faction of positive examples predicted correctly by the model from all the positive examples

New cards

True Negative Rate (specificity)

the fraction of negative examples predicted correctly by the model

New cards

ROC (Receiver Operating Characteristics)

is a graphical approach for displaying the trade-off between detection rate and false alarm rate plotting TPR against FPR

New cards

To draw a ROC curve, classifier must produce ____________ output

continuous-valued

New cards

Nearest Neighbor classification is mainly used when all attribute values are _______________ although they can be modified to deal with _______________

continuous, categorical attributes

New cards

Nearest Neighbor Classification

estimates the classification of an unseen instance using the classification of the instance or instances that are the closest to it (lazy learner)

New cards

Pros of KNN

simple and intuitive
no training phase
versatile
adaptable to multi-class problems

New cards

Cons of KNN

computationally expensive
sensitive to feature scaling
choice of K and distance metric
K-NN can struggle with imbalanced datasets

New cards

If k is too small, it can be …

sensitive to noise points

New cards

If k is too large…

the neighborhood may include points from other classes

New cards

A major problem when using the Euclidean distance formula (and many other distance measures) is that the __________ frequently swamp the ____________

large values, smaller one

New cards

What proximity measure is the best for documents?

co-sine similarity

New cards

Class weighting is crucial in critical systems like:

spam filtering and cancer diagnosis

New cards

A nearest neighbor classifier represents each example as a _________ in a d-dimensional space where d is the number of attributes

data point

New cards

Given a test instance, we compute its proximity to the _____________ according to one of the proximity measures

training instances

New cards

Find the k training instances that are ________ to the unseen instance. Take the ___________ classification for these k instances.

closest, most commonly occurring

New cards

Outputs are used to ____ test records, from the most likely positive class record to the least likely positive class record

rank

New cards

By using different thresholds on this value, we can create _________________ of the classifier with TPR/FPR tradeoffs

different variations

New cards

Many classifiers produce only ___________________

discrete outputs (i.e., predicted class)

New cards

How do you construct an ROC curve?

Use a classifier that produces a continuous-valued score for

each instance

• The more likely it is for the instance to be in the + class, the

higher the score

• Sort the instances in decreasing order according to the score

• Apply a threshold at each unique value of the score

• Count the number of TP, FP, TN, FN at each threshold

New cards

No model consistently ___________ the other

outperforms