Classification

Classification is a form of regression

classification has a response variable that is a category

Machine learning algorithm

  • mathematical model

  • calculated based on sample data

    • training data

  • makes predictions or decisions without being explicitly programmed to perform the task

A nearest neighbor classifier categorizes an unknown point by using the majority classified around it

pythagorean theorem to find the distance between two points

to find the k nearest neighbors

-find the distance between the example and each example in the training set

Augment the training data table with a column containing all the distances

sort the augmented table in increasing order of the distances

Take the top k rows of the sorted table