machine learning fundamentals

0.0(0)
studied byStudied by 0 people
full-widthCall with Kai
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/18

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

19 Terms

1
New cards
<p>ml</p>

ml

trad programs: define algo/logic to compute output

ml: learn model/logic from data

ml = study of also that: improve their performance P at some task T with experience E 

  • well defined learning task = given by <P, T, E>

<p>trad programs: define algo/logic to compute output</p><p>ml: learn model/logic from data</p><p></p><p>ml = study of also that: improve their performance P at some task T with experience E&nbsp;</p><ul><li><p>well defined learning task = given by &lt;P, T, E&gt;</p></li></ul><p></p>
2
New cards

why use ML? + applications

-human expertise doesn’t exist (navigating Mars)

-humans x explain their expertise (speech recognition )

-models must be customised (personalised medicine)

-models = based on huge amts of data (genomics)

applications:

recognising patterns

generating patterns

recognising anomalies

prediction

<p>-human expertise doesn’t exist (navigating Mars)</p><p>-humans x explain their expertise (speech recognition )</p><p>-models must be customised (personalised medicine)</p><p>-models = based on huge amts of data (genomics)</p><p></p><p>applications:</p><p>recognising patterns</p><p>generating patterns</p><p>recognising anomalies</p><p>prediction</p>
3
New cards
<p>data sets + features</p>

data sets + features

data set

= set of data grouped into a collection with which developers can work to meet their goals. In a dataset, rows represent the no of data points + column = features of data set

features of dataset = most critical aspect- based on the features of each available data point, is there any possibility of deploying models to find output to predict the features of any nee data point that may b added to data set

4
New cards
<p>data</p>

data

feature scaling

= scale data to a fixed range [0,1]

  • normalisation = rescale data x using the mean 𝜇 + the s.d. 𝜎 of the data

= (x- 𝜇) / 𝜎

  • min max scaling: (x - xmin) / (xmax - xmin)

types of data:

numerical (quantitive): continuous, discrete

categorical (qualitative): ordinal, nominal

5
New cards

the task T

tasks = usually described in terms of how the machine learning should process and example: 𝑥 ∈ R𝑛 where each entry xi = a feature

classification: learn f: Rn → {1,…,k}

  • y=f(x): assigns input to the category with output y

  • example: object recognition

regression: learn f: Rn → R

  • example: weather prediction, real estate price prediction

<p>tasks = usually described in terms of how the machine learning should process and example: <span>𝑥 ∈ R<sup>𝑛</sup> where each entry x<sub>i</sub> = a feature</span></p><p></p><p><strong>classification</strong>: learn f: R<sup>n</sup> → {1,…,k}</p><ul><li><p>y=f(x): assigns input to the category with output y</p></li><li><p>example: object recognition</p></li></ul><p></p><p>regression: learn  f: R<sup>n</sup> → R</p><ul><li><p>example: weather prediction, real estate price prediction </p></li></ul><p></p>
6
New cards

the experience E

supervised learning:

  • experience is a labelled dataset (or datapoints)

  • each data point has a label or target

unsupervised learning:

  • experience = unlabelled data set

  • clustering, learning probability distribution, demonising etc.

reinforcement learning:

  • experience is the interaction with an environment

7
New cards
<p>types of learning</p>

types of learning

supervised learning

given (x1, y1), (x2, y2), … (xn, yn)

learn a function f(x) to predict y given x

  • y is real-valued == regression

  • y is categorical == classification

<p>supervised learning</p><p>given (x<sub>1</sub>, y<sub>1</sub>), (x<sub>2</sub>, y<sub>2</sub>), … (x<sub>n</sub>, y<sub>n</sub>)</p><p>learn a function f(x) to predict y given x</p><p></p><ul><li><p>y is real-valued == regression</p></li><li><p>y is categorical == classification</p></li></ul><p></p>
8
New cards

the performance measure P

accuracy = proportion of examples for which the model produces the correct output

error rate: proportion of examples for which the model produces an incorrect output

loss function: quantifies the difference between the predicted outputs of a ml also + the actual target vals

generalisation: ability to perform well on previously unobserved data e.g. evaluate performance using test set

9
New cards

learning process

X = input space, Y = output space

  • given samples {(x,y)}n1, and a loss function L

  • find a hypothesis h ∈ 𝐻 that minimises ∑𝑖=1,...𝑛 𝐿(h(𝒙𝑖), 𝑦𝑖).

0-1 loss: 𝐿(h(𝒙), 𝑦) = 1, h(x) ≠ 𝑦,otherwise 𝐿(h(𝒙),𝑦) =0

L2 loss: 𝐿(h(𝒙), 𝑦) = (h(x) - y)²

hinge loss: 𝐿(h(𝒙), 𝑦) = max{0, 1 - yh (x)}

exponential loss: 𝐿(h(𝒙), 𝑦) = e -yh(x)

10
New cards

no free lunch theorem

argues that, w.o. having substantive info abt modelling problem, x single model that’ll always do better than any other model

goal of mlresearch isn’t to seek a universal learning algorithm or the absolute best learning algorithm 

the theorem underscores that every algorithm relies on certain assumptions abt data + success of algorithm depends on how well these assumptions align w true nature of the problem

since x also is universally superior = crucial to eval = compare diff algorithms on the specific dataset at hand

11
New cards

splitting the data set

training set = subset of data used to train a machine learning model

test set= the subset of data used to evaluate the performance of a trained ml model on unseen examples, simulating real-world data

validation set= intermediary subset of data used during the model development process to fine-tune hyper parameters

independent + identically distributed assumptions:

  • examples in each data set = independent from each other

  • training + testing set = identically distributed i.e. drawn from the same prob distribution as eachother

<p>training set = subset of data used to train a machine learning model</p><p></p><p>test set= the subset of data used to evaluate the performance of a trained ml model on unseen examples, simulating real-world data</p><p></p><p>validation set= intermediary subset of data used during the model development process to fine-tune hyper parameters</p><p></p><p>independent + identically distributed assumptions:</p><ul><li><p>examples in each data set = independent from each other</p></li><li><p>training + testing set = identically distributed i.e. drawn from the same prob distribution as eachother</p></li></ul><p></p>
12
New cards
<p>underfitting + overfitting</p>

underfitting + overfitting

underfitting occurs when model = too simple to capture underlying patterns in the training data = poor performance on both training + test sets

  • model isn’t complex enough to learn the relationships within the data

overfitting = model learns the training data too well, incl. its noise + random fluctuations leading to poor performance on new, unseen data

  • model = overly complex + memorises the training set instead of learning the underlying patterns

13
New cards

overfitting

a hypothesis in ml is the model’s presumption regarding the connection between input features + the output 

consider hypothesis h and its

  • error rate over training data: error train(h)

  • true error rate over all data: error true(h)

hypothesis h overfits the training data is there’s an alternative hypothesis h’ that:

  • error train(h) < error train(h’)

  • error train(h) > error train(h’)

<p>a hypothesis in ml is the model’s presumption regarding the connection between input features + the output&nbsp;</p><p></p><p>consider hypothesis h and its</p><ul><li><p>error rate over training data: error<sub> train</sub>(h)</p></li><li><p>true error rate over all data: error<sub> true</sub>(h)</p></li></ul><p></p><p>hypothesis h overfits the training data is there’s an alternative hypothesis h’ that:</p><ul><li><p>error<sub> train</sub>(h) &lt;&nbsp;error<sub> train</sub>(h’)</p></li><li><p>error<sub> train</sub>(h) &gt;&nbsp;error<sub> train</sub>(h’)</p></li></ul><p></p>
14
New cards

resolving under + overfitting

underfitting:

  • ↑ model complexity

  • using diff ml algorithm

  • ↑ amt of training data

  • ensemble methods to combine multiple models to create better outputs

  • feature engineering for cresting new model features from the existing ones that may be ↑ relevant to the learning task

overfitting:

  • cross validation: technique for evaluating ml models by training several ML models on subsets of the available input data + evaluating them on another subset of the data

  • regularisation: technique where a penalty term = added to loss function, discouraging model from assigning too much importance to individual features

  • early stopping: stopping training when a monitored metric has stopped improving

  • bagging: learning multiple models in parallel + applying majority voting to choose the final candidate model

15
New cards
<p>cross validation</p>

cross validation

k-fold cross validation:

  • divide data into k folds

  • train on k-1 folds, use the 4th fold to measure error

  • repeat k times: use avg error to measure generalisation accuracy

  • statistically valid + gives good accuracy estimates

leave one out cross validation (LOOCV):

  • k fold cross validation with k=N, where N= no of data points

  • quite accurate but expensive as need to build N models

<p>k-fold cross validation:</p><ul><li><p>divide data into k folds</p></li><li><p>train on k-1 folds, use the 4th fold to measure error</p></li><li><p>repeat k times: use avg error to measure generalisation accuracy</p></li><li><p>statistically valid + gives good accuracy estimates</p></li></ul><p></p><p>leave one out cross validation (LOOCV):</p><ul><li><p>k fold cross validation with k=N, where N= no of data points</p></li><li><p>quite accurate but expensive as need to build N models</p></li></ul><p></p>
16
New cards

parametric learning

parametric learning algos make strong assumptions abt form of the mapping function between the input features + output

  • e.g. logistic regression, linear regression, perceptron, naive bayes, neural network

benefits of such models:

  • easier to understand + interpret results

  • v fast to learn from data

  • don’t require as much training data

  • can work well even if they do x fit data perfectly

but, by pre-emptively choosing a functional form, these methods = highly constrained to those specified form

17
New cards
<p>non parametric learning</p>

non parametric learning

non parametric learning algos dont make assumptions about the form of the mapping function between the input features + output

  • for example, SVM, k-NN, k-means, decision tree

benefits include:

  • being capable of fitting a large no of functional forms

  • but there are x assumptions abt the underlying function +

  • can result in higher performance models for prediction

but,: requires a lot more training data, takes longer to train + prone to overfitting

18
New cards
<p>classification</p>

classification

predictive modelling problem where a class label = predicted for a given example of input data

types of classification problems:

  • binary classification

  • multi-class classification

  • multi-label classification

  • imbalanced classification

<p>predictive modelling problem where a class label = predicted for a given example of input data</p><p></p><p>types of classification problems:</p><ul><li><p>binary classification</p></li><li><p>multi-class classification</p></li><li><p>multi-label classification</p></li><li><p>imbalanced classification</p></li></ul><p></p>
19
New cards

classification applications

Medical diagnosis
oFeatures: age, gender, history, symptoms, test results. oLabel: disease.

• Email spam detection
oFeatures: sender-domain, length, images, keywords. oLabel: spam or not-spam.

• Credit card fraud detection
oFeatures: user, location, item, price. oLabel: fraud or legitimate.