Machine Learning basics

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/41

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

42 Terms

1
New cards

where do machine learning algorithms learn from

patterns from data

2
New cards

What is meant by examples and what is each one called

examples are training data, each one is called a sample

3
New cards

what is a sample characterised by

one or more features

4
New cards

what is a feature 

an input variable given to a model 

5
New cards

what is represented by columns

the features

6
New cards

what is supervised learning

trains model using labelled data “learning/ predicting using an answer key”

7
New cards

in supervised learning what do you give the machine

examples and correct answers

8
New cards

what is unsupervised learning

training with no labelled data 

9
New cards

in unsupervised learning what do you give the model 

only examples no correct answers or labels

10
New cards

give examples of supervised learning

Regression and classification

11
New cards

what is regression

where the known outputs are continuous, called target values. We are predicting a number

<p>where the known outputs are continuous, called target values. We are predicting a number </p>
12
New cards

what is classification

categorical called labels, picking a label , we know the categories

<p>categorical called labels, picking a label , we know the categories </p>
13
New cards

give examples of unsupervised learning

Clustering and dimensionality reduction

14
New cards

what is clustering

group samples with similar feature vectors into clusters, don’t know categories

<p>group samples with similar feature vectors into clusters, don’t know categories </p>
15
New cards

what is dimensionality reduction

shrinking number of features while keeping important information

<p>shrinking number of features while keeping important information </p>
16
New cards

what is a sample

one data point (1 row in a dataset)

17
New cards

what is a feature vector

list of all features in 1 sample

18
New cards

ML task is to learn from a model f that returns?

predicted output y^

19
New cards

if we have N training samples then how many feature vectors do we have 

20
New cards

what is underfitting

model not learning enough, doesn’t capture pattern in data  

<p>model not learning enough, doesn’t capture pattern in data &nbsp;</p>
21
New cards

what are signs of underfitting

bad performance on training and test data , model is too basic (too few parameters)

22
New cards

what is overfitting

model learns too much , including noise and irrelevant details 

<p>model learns too much , including noise and irrelevant details&nbsp;</p>
23
New cards

what are signs of overfitting

good performance on training data, bad performance on test data, model too complex 

24
New cards

what is just right

complex enough to capture real patterns , simple enough to ignore noise 

<p>complex enough to capture real patterns , simple enough to ignore noise&nbsp;</p>
25
New cards

what are signs of just right

good performance on training and test data , balanced model

26
New cards

what is polynomial regression

linear regression but using a curve to fit to data

27
New cards

give the eq for polynomial regression

w is weights M is polynomial degree

<p>w is weights M is polynomial degree</p>
28
New cards

if M = 1 then and if M= 2 then and if M is 20 then

Straight line , underfitting , Curbed line , just right , crazy line , overfitting

29
New cards

what is a hyperparameter and give example

a setting chosen by me , i.e M and we choose it before training

30
New cards

what is Bias

error from being too simple , if its high it means systematic mistake

31
New cards

what is variance

error from being too sensitive , if its high it means its memorised noise 

32
New cards

what is noise

randomness in data

33
New cards

what is training set

data the model learns from

34
New cards

what is R squared? and when is it perfect

measure of how well a regression model fits the data , its perfect when equal to 1

35
New cards

give eq for R squared

knowt flashcard image
36
New cards

what happens to training R squared when we increase M and what does this mean

it increases, hence we cant detect overfitting as it using R squared alone

37
New cards

what is validation

data not used for training , used to choose best degree M

38
New cards

explain R squared on a validation set

highest when M is 2 , decreases after that

can identify optimal hyperparameters

39
New cards

explain test set 

data not used during training and validation

used at the end to measure final performance

40
New cards

explain R squared on a test set

evaluates performance

41
New cards

when is cross validation used and what does it do

when dataset is small, splitting into training , validation and test wastes valuable data. it creates many validation sets

42
New cards

what are the steps for cross validation

set aside a test set

divide rest of data into k equal parts called folds 

if k =5 we have 5 folds

perform 5 runs : in each run use 4 folds for training and 1 fold for validation 

rotate which one is validation , so use a different fold for validation in each run

take the average of the 5 validation R squared scores

choose the hyperparameter with the best average score 

combine all 5 folds into 1 training set and use it again with the best hyperparameter to create a strong model

evaluate on test set 

<p>set aside a test set </p><p>divide rest of data into k equal parts called folds&nbsp;</p><p>if k =5 we have 5 folds</p><p>perform 5 runs : in each run use 4 folds for training and 1 fold for validation&nbsp;</p><p>rotate which one is validation , so use a different fold for validation in each run</p><p>take the average of the 5 validation R squared scores</p><p></p><p>choose the hyperparameter with the best average score&nbsp;</p><p></p><p>combine all 5 folds into 1 training set and use it again with the best hyperparameter to create a strong model</p><p></p><p>evaluate on test set&nbsp;</p>