Machine Learning

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/18

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced
Call with Kai

No study sessions yet.

19 Terms

1
New cards

3 Types of machine learning

Unsupervised learning - Detect patterns without predefined outcomes

Supervised learnings - Make best predictions of output given input

Reinforcement learning - Machine learns form trial and error

2
New cards

Unsupervised learning

Learn from data without no labeled outcomes (unlabelled data)

Algorithm discovers hidden patterns / structures independently

Goal: Understand data structure, group observations, reduce dimensionality

3
New cards

2 main approaches for unsupervised learning

Clustering - group similar observations

Dimensionality reduction - Reduce nr. variables while keeping as much information as possible

4
New cards

K-means clustering

Observation belonging to same group must share same characteristics

=> group so distance between obs. within one group is smallest p. and between groups largest p.

K = nr. of clusters to estimate (hyperparameter)

Methods for choosing K
Elbow technique (find k with best within cluster compactness) and silhouette technique (considers both within and between cluster distance, Silhouette coefficient, calculate for all obs. at all levels of K)

5
New cards

Supervised Learning

Machine given X and corresponding Y to train (labelled dataset)

Adjust parameters to make best prediction for output when given input

Goal: Predict outputs beyond sample data

6
New cards

Underfitting vs overfitting

Underfit - Model cannot capture underlying structure of data

Overfit - Model corresponds too exactly to particular set of data

7
New cards

K-Nearest Neighbours (KNN)

Predicts labels of new data based on common labels of its K nearest neighbours

Decides on closeness between unlabelled point and neighbours

K = nr. of neighbours to consider

8
New cards

Confusion matrix

Table comparing predicted labels against true labels

Accuracy, sensitivity and specificity derived from confusion matrix

9
New cards

Parts in confusion matrix

True positive - Obs. correctly predicted to belong to a class

True negative - Obs. correctly predicted to not belong to a class

False positive - Obs. incorrectly predicted to belong to a class

False negative - Obs. incorrectly predicted to not belong to a class

10
New cards

Accuracy

Proportion of obs. correctly labelled by algorithm

Percentage of predictions that are correct

= (TP + TN) / all predictions

No information rate - Accuracy achieved if assign all obs. to largest group

11
New cards

Sensitivity

Proportion of obs. correct predicted to belong to a category

Out of all that are group A, how many were correctly classified as A

= TP / (TP + FN)

12
New cards

Specificity

Proportion of obs. correctly predicted to not belong to a category

Out of all that are not A, how may were correctly classified as not A
= TN / (TN + FP)

13
New cards

3 assumptions of KNN

KNN measures distances between points and neighbours (input needs to be continuous + numerical and same unit / standardised)

KNN assumes obs. with similar characteristics being to same group

KNN measures similarity through a spherical distance around each data point using all features equally

14
New cards

Decision tree

Way of splitting data into purer subsets

15
New cards

Pure class

Subset containing only data from same group

16
New cards

Root node

First node in trining set, contains all data

17
New cards

Leaf node

Splitting attempt stop here, should contain pure classes

18
New cards

Internal nodes

intermediary nodes

19
New cards

GINI Impurity

Way of finding best split

Measures how mixed classes are in a node

Perfect split = 0