Machine Learning - Test 1

0.0(0)
studied byStudied by 3 people
0.0(0)
full-widthCall with Kai
GameKnowt Play
New
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/144

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

145 Terms

1
New cards

f: X → Y

Where X is the __

  • Domain (math)

  • Parameter Type (CS)

  • Input Type (engineering)

  • Key Type (map ADT)

2
New cards

Supervised vs Unsupervised

Supervised we provide input and output while unsupervised only provides input

3
New cards

Data

the information that the model uses to learn patterns and make predictions

4
New cards

f: X → Y

Where Y is the __

  • Codomain (math)

  • Return Type (cs)

  • Output Type (engineering)

  • Value Type (map ADT)

5
New cards

Model

The function/program/tool we want to make

6
New cards

Observations/Data Points

The values of a domain of a model

7
New cards

Targets

Values of a codomain of a model

8
New cards

Datasets

Collection of observations collated with targets that once trained can predict new targets when given new

9
New cards

Dimensionality

number of features/attributes of data

10
New cards

Regression

  • predicting a continuous #

  • Target type is real #’s

  • Y = R

11
New cards

Classification

  • Y is a finite set

  • Target values are labels/classes/categories/tags

12
New cards

Binary Classification

  • Only 2 outcomes

  • |Y| = 2

  • {T,F} , { 1, 0}, {1, -1}

13
New cards

Density estimation

  • Target type is [0,1]

  • (every value in between 0 and 1)

14
New cards

Model Family

  • General form for a class of models

  • we try to predict the family to extract something

15
New cards

Model Family example

Linear family would be y = C0 + C1x

16
New cards

Parameters (weights/coefficients)

constants used to specify a model in a model family

17
New cards

Parameters example

In model family : y = C0 + C1x

C0 = 5, C1 = 2 are the parameters

18
New cards

Model example

In model family : y = C0 + C1x

y = 5 + 1.75x

19
New cards

Hyperparameters

variables to specify options in a training alg

20
New cards

Training/learning/fitting

Finding parameters to specify a model

21
New cards

Error/Loss Function

  • Function chosen to evaluate the model

  • lower values = better

22
New cards

Training Set

Portion of data used for training vs testing

23
New cards

Generalization

How well does the model do on data it wasn’t trained on

24
New cards

Overfitting

Model is strict to the training data

25
New cards

A data point in the training set or test set

observation

26
New cards

An input value to the model

Observation

27
New cards

An output value of the model

Target

28
New cards

A “correct answer” to a data point in the training set or test set

Target

29
New cards

A RD vector

Observation

30
New cards

When the function we're looking for is R^D->{-1, 1}.

Binary Classification

31
New cards

When the function we're looking for is R^D -> R.

Regression

32
New cards

When the function we're looking for is R^D -> Y for some finite set Y.

Classification

33
New cards

When the function we're looking for is R^D -> [0, 1]

Density Estimate

34
New cards

When we want to model a real-valued function.

Regression

35
New cards

When we want to model a probability distribution.

Density estimate

36
New cards

When we want to associate each observation with one of a finite set of labels or categories.

Classification

37
New cards

Variable

contains all values that measure the same attribute across units

38
New cards

Observation

contains all values measured on the same unit (row)

39
New cards

Attribute

A column/instance variable

40
New cards

Feature

value in a column, value of an instance variable

41
New cards

Value

element in the table, measurement, datum, feature

42
New cards

Tidy data

  • each var forms a column

  • each observation forms a row

  • each type of observational units forms a table

43
New cards

Feature Selection

The process of transforming the dataset by keeping only the most informative features

44
New cards

Curse of dimensionality

The phenomenon of the difficulty of training accurate models increasing with the dimensionality of the data

45
New cards

N

# of observations

46
New cards

D

# of attributes (dimensionality)

47
New cards

General - X

data set as NxD matrix

48
New cards

General - y

N length vector of target values

49
New cards

General - n

used as an index into the data set, for example Xn and ⃗yn

50
New cards

General - i and j

used as indices into features, for example ⃗xi where ⃗x = Xn

51
New cards

What is the idea behind K nearest neighbors?

With classification, its most likely items from the same class are next to one another → when classifying a new item look at the classes of its k nearest neighbors to determine

52
New cards

KNN algorithm

  • compute all distances between new data point and X

  • sort by the distances calculated

  • take the closest k distances

    • use array bag to tally their classes

  • Return class w/ highest tally

53
New cards

KNN - Cost

Depends on size of D:

Distance computations - O(ND)

Sort distances - O(NlgN)

total : O(ND+NlgN) → O(NlgN)

54
New cards

What are the hyperparameters of KNN?

  • The number of neighbors

  • The distance metric

55
New cards

KNN - when k = 1

  • The model perfectly memorizes the data.

  • Highly sensitive to noise (one wrong neighbor ruins the classification).

  • Very low bias, high variance → Overfitting.

56
New cards

KNN - when k = N

  • Every query is classified based on the majority class in the entire dataset.

  • Ignores local structure → Underfitting.

57
New cards

Why is KNN non-parametric

It does not assume a fixed form (parameters) for the decision boundary. It memorizes training data and bases predictions on local neighborhoods.

58
New cards

Metric or Distance Function

Any function d between two vectors where:

  1. d(x,y) = d(y,x)

  2. d(x,z) <= d(x,y) + d(y,z)

  3. d(x,y) = 0 iff x=y

59
New cards

What are Norms?

  • they measure vector magnitude (size)

  • compute the distance between two points

60
New cards

Euclidian Distance

  • L2 Norm

  • for general continuous distance

  • sqaure root sum of square values

61
New cards

Manhattan/City-block Distance

  • L1 norm

  • For general continuous data

  • Sum of absolute values

62
New cards

Norm

a mathematical function that measures the size or magnitude of a vector in a given space

63
New cards

KNN properties

  • it’s instance based

  • lazy, most work is put into classification

  • it’s non-parametric, it has no parameters

64
New cards

R or C: Attribute

Column

65
New cards

R or C: Covariate

Column

66
New cards

R or C: Data Point

Row

67
New cards

R or C: Sample

Row

68
New cards

R or C: Feature

Column

69
New cards

R or C: Observation

Row

70
New cards

R or C: Variable

Column

71
New cards

Main problem of Curse of Dimensionality

greater the dimensionality → sparser the data is within the vector space

72
New cards

Regression general form

Given data X (N observations in D dimensions) and N target values as ⃗y, find a function for predicting the values of new data points

73
New cards

Cost

var, function, formula that we want to minimize in an optimization problem

74
New cards

Error

difference between the computed value and the correct value

75
New cards

Loss

  • measures how well the model performs

  • interprets error

76
New cards

Risk

how well a model performs on all possible data

  • difficult since we don’t have all possible data

  • expected val of loss function applied to arbitrary data

77
New cards

Empirical Risk Minimization (ERM)

General strategy of finding model that minimizes loss on the training data

78
New cards

linear regression model family

y(x) = θ0 + θ1x

79
New cards

How can we make lin reg model family into lin algebra form?

Refer to the params as vectors:

  • θ=[θ0, θ1]

  • x=[1, x]

and multiply : θ * x

80
New cards

What is linear about linear regression?

  • output is weighted sum of inputs

  • no multiplication between features

  • forms a straight line or hyperplane

81
New cards

Simple Linear Regression - Loss Function

SSE - Sum of Squared Errors

(this is for a single input x)

  • θ is the slope and intercept

  • subtract the estimated y value from the actual

<p>SSE - Sum of Squared Errors</p><p>(this is for a single input x)</p><ul><li><p>θ is the slope and intercept</p></li><li><p>subtract the estimated y value from the actual </p></li></ul><p></p>
82
New cards

Multiple Linear Regression

  • this models multiple input features

  • θn determines how much impact it’s corresponding xn has

  • θ0 is the intercept still and we extend the x vector by one to match the length of the θ vector

<ul><li><p>this models multiple input features </p></li><li><p>θ<sub>n</sub> determines how much impact it’s corresponding x<sub>n</sub> has </p></li><li><p>θ<sub>0 </sub>is the intercept still and we extend the x vector by one to match the length of the θ vector</p></li></ul><p></p>
83
New cards

Multiple Linear Regression - Loss Function

Same as linear regression but on larger scale, subtracting predicted from the actual y and doing matrix multiplication to get the square

<p>Same as linear regression but on larger scale, subtracting predicted from the actual y and doing matrix multiplication to get the square </p>
84
New cards

What is regularization?

This adds an extra term to a normal loss function which penalizes large values of the model parameters/weights (θ) making prediction more general → preventing overfitting

85
New cards

Ridge Regularization

  • essentially linear regression with an L2 penalty on the weights

  • α controls strength of penalization → larger α → smaller coefficients

<ul><li><p>essentially linear regression with an L2 penalty on the weights </p></li><li><p>α controls strength of penalization → larger α  → smaller coefficients</p></li></ul><p></p>
86
New cards

Lasso Regularization

  • Uses the L1 penalty instead * α to determine strength

  • uses Feature Selection and eliminates any features that are irrelevant

<ul><li><p>Uses the L1 penalty instead * α to determine strength</p></li><li><p>uses <strong>Feature Selection </strong>and eliminates any features that are irrelevant </p></li></ul><p></p>
87
New cards

Closed Form Solution

Provides a direct formula for how to get the optimal parameters that minimizes loss without using things like gradient descent

88
New cards

Which versions of linear regression have a closed form solution?

  • linear regression

  • Ridge regression

  • XX Lasso (absolute val - non differentiable) XX

89
New cards

What is Gradient Descent

Minimizes a function, typically loss by adjusting the parameters to the steepest descent and can be used on functions that don’t have a closed form solution.

90
New cards

Gradient Descent General Outline

  1. Initialize parameters to random vector

  2. Compute loss with those parameters

  3. compute gradient → tells direction of steepest descent

  4. update to new parameters

  5. repeat until change is negligible or max iterations

91
New cards

Basis Functions

transform x inputs into a more flexible representation → polynomial regression to be captured in a linear space

<p>transform x inputs into a more flexible representation → polynomial regression to be captured in a linear space </p>
92
New cards

Premise of adapting linear regression for classification

when we apply the sigmoid logistic function to lin reg → output between 0 and 1 → interpret it as a probability of being in a certain class

93
New cards
<p></p>

Logistic Function (sigmoid function)

94
New cards
term image

Model family for logistic regression

95
New cards

Use of logistic function

  • maps real val input to range (0,1) for classification purposes

  • smooth version of the step function

96
New cards

Logistic Function useful properties

It has a nice derivative to work with

97
New cards
term image
  • mean log loss

  • loss function for logistic regression

98
New cards
term image

these will either be 0 or 1 and will determine which error to use

99
New cards
term image

This is how far off the estimate is

100
New cards
term image

We divide by N because we are getting the mean