* Labeled observations: Each observation is a tuple (x,y) of feature vector x and output label y which are related according to an unknown function f(x) = y
* During training: Learn the relationship between x and y i.e., find a function (or model) ℎ(x) that best fits the observations
* Goal: Learned model accurately predicts the output label of a previously unseen, test feature input (generalization)
* Labels : ‘Teachers’ during training, and ‘validator’ of results during testing