ELEC 825 Week 1

5.0(1)
studied byStudied by 34 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/26

flashcard set

Earn XP

Description and Tags

basics of ML with a focus on linear classifiers, Naive Bayes, and SVM

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

27 Terms

1
New cards

n-fold cross validation

  • solves problem of noisy estimate of predictive performance when there isn’t enough data for validation

1) split training data into n folds

2) in each run, train on n-1 folds and validate on the remaining 1 fold

3) repeat (2) n times

4) calculate the average performance on the n validation folds

5) select the model that achieves the best average performance

<ul><li><p>solves problem of noisy estimate of predictive performance when there isn’t enough data for validation</p></li></ul><p>1) split training data into n folds</p><p>2) in each run, train on n-1 folds and validate on the remaining 1 fold</p><p>3) repeat (2) n times</p><p>4) calculate the average performance on the n validation folds </p><p>5) select the model that achieves the best average performance</p>
2
New cards

what are the 3 main phases in solving ML problems?

1) training: learning the models.

2) validation: selecting models.

3) testing: evaluating the models you finally obtained above

3
New cards

support vectors

examples closest to the hyperplane

4
New cards

margin

distance between support vectors

<p>distance between support vectors</p>
5
New cards

support vector machine (SVM)

maximum-margin classifier that maps data to a higher dimension space (implicitly using kernel)

<p>maximum-margin classifier that maps data to a higher dimension space (implicitly using kernel)</p>
6
New cards

how do you reduce sensitivity to outliers?

by allowing misclassification during training → bias/variance tradeoff

7
New cards

when is the margin largest?

when the hyperplane is halfway between the 2 samples

8
New cards

how do you classify datasets that aren’t linearly separable?

represent the data into a richer (larger) space which includes some fixed, possibly nonlinear functions of the original inputs/measurements

9
New cards

linear classifier

find the line (or hyperplane) which can “best” (under some criterion/objective) separate two classes:

𝑦(𝑥)=𝑊^𝑇 𝑥+𝑏

10
New cards

what are the differences between regression and classification?

1) regression tries to predict a number while classification predicts a category

2) regression identifies many possible outputs while classification only has a small number of possible outputs

11
New cards

machine learning

computer receives the input and expected output to write a program for solving the problem

12
New cards

when do we need machine learning?

when a) human expertise does not exist, b) models are based on huge amounts of data, c) humans can't explain their expertise, d) models must be customized

13
New cards

supervised learning

given training samples and corresponding desired

outputs (labels), predict outputs on future inputs

14
New cards

what are some examples of supervised learning?

classification, regression, time series prediction

15
New cards

unsupervised learning

  • given training samples without desired outputs, automatically discover representations, features, structures, etc.

  • learn a function f(x) to explore the hidden structure of x

16
New cards

what are some examples of unsupervised learning?

clustering, low-dimensional manifold learning

17
New cards

reinforcement learning

given sequences of states and actions with

(delayed) scalar rewards/punishments, output a policy

18
New cards

what are some examples of what a learning algorithm can do?

recognizing patterns, generating patterns, recognizing anomalies, prediction

19
New cards

regression

given {x_i, y_i} where x_i E R^D (set of all vectors with d real components) and y_i is a real value, learn a function f(x) to predict y given x

<p>given {x_i, y_i} where x_i E R^D (<span style="font-family: -apple-system, system-ui, BlinkMacSystemFont, Segoe UI, Roboto, Oxygen-Sans, Ubuntu, Cantarell, Helvetica Neue, sans-serif">set of all vectors with d real components) and </span>y_i is a real value, learn a function f(x) to predict y given x</p><p></p>
20
New cards

classification

given {x_i, y_i} where x_i E R^D (set of all vectors with d real components) and y_i is categorical, learn a function f(x) to predict y given x

<p>given {x_i, y_i} where x_i E R^D (<span style="font-family: -apple-system, system-ui, BlinkMacSystemFont, Segoe UI, Roboto, Oxygen-Sans, Ubuntu, Cantarell, Helvetica Neue, sans-serif">set of all vectors with d real components) and </span>y_i is categorical, learn a function f(x) to predict y given x</p>
21
New cards

representation learning

  • a form of unsupervised learning that learns a good representation for the inputs, which is then used for subsequent supervised or reinforcement learning

  • compact and low-dimensional representation for inputs

22
New cards

outlier detection

a form of unsupervised learning that detects highly unusual cases from the given data

23
New cards

rule learning

a form of unsupervised learning that learns rules from the data

24
New cards

feature space

multidimensional space where each dimension represents a different feature or characteristic of data

25
New cards

feature normalization

process of transforming numerical features to a standard scale, usually between 0 and 1, to ensure fair comparison and prevent dominance of certain features in machine learning models

26
New cards

what are the steps to Naive Bayes Theorem for discrete features?

1) sort data cases into bins according to Ck

2) compute class prior p(Ck) using frequencies

3) for each class, estimate the distribution of ith variable/feature: 𝑝(𝑥_𝑖│𝐶_𝑘 )

4) compute the posterior

<p>1) sort data cases into bins according to Ck</p><p>2) compute class prior p(Ck) using frequencies</p><p>3) for each class, estimate the distribution of ith variable/feature: 𝑝(𝑥_𝑖│𝐶_𝑘 )</p><p>4) compute the posterior</p><p></p>
27
New cards

how do you deal with continuous features if using Naive Bayes Theorem?

discretization OR fit a known distribution