1/18
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
3 Types of machine learning
Unsupervised learning - Detect patterns without predefined outcomes
Supervised learnings - Make best predictions of output given input
Reinforcement learning - Machine learns form trial and error
Unsupervised learning
Learn from data without no labeled outcomes (unlabelled data)
Algorithm discovers hidden patterns / structures independently
Goal: Understand data structure, group observations, reduce dimensionality
2 main approaches for unsupervised learning
Clustering - group similar observations
Dimensionality reduction - Reduce nr. variables while keeping as much information as possible
K-means clustering
Observation belonging to same group must share same characteristics
=> group so distance between obs. within one group is smallest p. and between groups largest p.
K = nr. of clusters to estimate (hyperparameter)
Methods for choosing K
Elbow technique (find k with best within cluster compactness) and silhouette technique (considers both within and between cluster distance, Silhouette coefficient, calculate for all obs. at all levels of K)
Supervised Learning
Machine given X and corresponding Y to train (labelled dataset)
Adjust parameters to make best prediction for output when given input
Goal: Predict outputs beyond sample data
Underfitting vs overfitting
Underfit - Model cannot capture underlying structure of data
Overfit - Model corresponds too exactly to particular set of data
K-Nearest Neighbours (KNN)
Predicts labels of new data based on common labels of its K nearest neighbours
Decides on closeness between unlabelled point and neighbours
K = nr. of neighbours to consider
Confusion matrix
Table comparing predicted labels against true labels
Accuracy, sensitivity and specificity derived from confusion matrix
Parts in confusion matrix
True positive - Obs. correctly predicted to belong to a class
True negative - Obs. correctly predicted to not belong to a class
False positive - Obs. incorrectly predicted to belong to a class
False negative - Obs. incorrectly predicted to not belong to a class
Accuracy
Proportion of obs. correctly labelled by algorithm
Percentage of predictions that are correct
= (TP + TN) / all predictions
No information rate - Accuracy achieved if assign all obs. to largest group
Sensitivity
Proportion of obs. correct predicted to belong to a category
Out of all that are group A, how many were correctly classified as A
= TP / (TP + FN)
Specificity
Proportion of obs. correctly predicted to not belong to a category
Out of all that are not A, how may were correctly classified as not A
= TN / (TN + FP)
3 assumptions of KNN
KNN measures distances between points and neighbours (input needs to be continuous + numerical and same unit / standardised)
KNN assumes obs. with similar characteristics being to same group
KNN measures similarity through a spherical distance around each data point using all features equally
Decision tree
Way of splitting data into purer subsets
Pure class
Subset containing only data from same group
Root node
First node in trining set, contains all data
Leaf node
Splitting attempt stop here, should contain pure classes
Internal nodes
intermediary nodes
GINI Impurity
Way of finding best split
Measures how mixed classes are in a node
Perfect split = 0