4.3 Nearest Neighbor Classification

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/23

There's no tags or description

Looks like no tags are added yet.

Last updated 6:09 PM on 6/22/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

24 Terms

New cards

Why is the k-Nearest Neighbor algorithm considered a "Lazy Learner"?

Because no explicit model is learned and the classifier uses the training samples directly.

New cards

What does "learning by analogy" mean in the k-NN algorithm?

It means finding the most similar training samples (the "decision set") and classifying the new sample based on the majority of those samples.

New cards

What is used to determine the similarity between samples in the k-NN algorithm?

A distance function.

New cards

What does the hyperparameter 'k' represent in the k-nearest Neighbour algorithm?

It represents how many similar samples are being searched for.

New cards

What is the "decision set" in a k-Nearest Neighbor classifier?

The group of the most similar training samples found to classify a new sample.

New cards

What is the problem with choosing a "too small" value for the hyperparameter k in the k-NN algorithm?

It does not generalize enough, leading to a high sensitivity to outliers.

New cards

What happens when you choose a "too large" value for the hyperparameter k in the k-NN algorithm?

It generalizes too much, causing many objects from other clusters or classes to be included in the decision set.

New cards

What range for the hyperparameter k typically yields the highest classification quality?

An average k, often 1 << k < 10.

New cards

In a k-NN visual representation, what happens to the decision set boundary as k increases (e.g., from k=1 to k=17)?

The boundary expands outward to include more neighboring data points, increasing the likelihood of capturing points from different classes.

New cards

What is the first step in hyperparameter selection for a parameter like k?

Try different values for the hyperparameter (e.g., k = 1, 2, 3...).

New cards

How do you evaluate the effectiveness of each tested hyperparameter value?

By determining the classification error for each value.

New cards

What are two common methods used to determine the classification error for a specific hyperparameter?

1) Evaluating it on a validation set, or 2) Using m-fold cross-validation.

New cards

After calculating the errors for various hyperparameter values, which one is selected for the final model?

The value that yields the best classification results (the highest accuracy or lowest error).

New cards

When looking at a graph of "Number of Neighbors (k) vs Accuracy", what is the primary goal?

To identify the peak of the graph, which represents the optimal k value that maximizes classification accuracy.

New cards

What is the default decision rule in a k-NN classifier?

Select the majority class of the decision set.

New cards

How does a weighted decision rule differ from the default rule in k-NN?

Instead of a simple majority vote, it weighs the training samples in the decision set based on specific criteria.

New cards

What are three common criteria used to weight samples in a k-NN decision set?

By distance to the sample, by class distribution, or by a cost function.

New cards

Why might you weight k-NN samples by class distribution?

To account for highly unequal distributions, preventing a rare class from being automatically outvoted by a dominant majority class.

New cards

When is it appropriate to use a cost function to weight a k-NN decision rule?

When predicting one class incorrectly (e.g., predicting A when it's actually B) is considered "worse" or more costly than the reverse.

New cards

What is a major advantage of the k-NN algorithm regarding its general performance?

It typically achieves high classification accuracy in many applications.

New cards

Can the k-NN algorithm be used for continuous targets, or is it strictly for classification?

Variants of k-NN can also be used for the prediction of numerical values (regression).

New cards

Why is "incrementality" considered an advantage of the k-NN algorithm?

Because it is a lazy learner, the model does not need to be adapted or retrained when new training data is added.

New cards

Why is the k-NN algorithm considered inefficient during the actual classification phase?

Because it must calculate the distance against all existing training samples to classify just one new sample.

New cards

What is a major disadvantage of k-NN regarding interpretability and insights?

It returns no explicit knowledge or structural rules about the classes.