Machine Learning - Test 1

0.0(0)

Studied by 3 people

0.0(0)

Call with Kai

Knowt Play

New

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/144

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

145 Terms

New cards

f: X → Y

Where X is the __

Domain (math)
Parameter Type (CS)
Input Type (engineering)
Key Type (map ADT)

New cards

Supervised vs Unsupervised

Supervised we provide input and output while unsupervised only provides input

New cards

Data

the information that the model uses to learn patterns and make predictions

New cards

f: X → Y

Where Y is the __

Codomain (math)
Return Type (cs)
Output Type (engineering)
Value Type (map ADT)

New cards

Model

The function/program/tool we want to make

New cards

Observations/Data Points

The values of a domain of a model

New cards

Targets

Values of a codomain of a model

New cards

Datasets

Collection of observations collated with targets that once trained can predict new targets when given new

New cards

Dimensionality

number of features/attributes of data

New cards

Regression

predicting a continuous #
Target type is real #’s
Y = R

New cards

Classification

Y is a finite set

Target values are labels/classes/categories/tags

New cards

Binary Classification

Only 2 outcomes
|Y| = 2
{T,F} , { 1, 0}, {1, -1}

New cards

Density estimation

Target type is [0,1]
(every value in between 0 and 1)

New cards

Model Family

General form for a class of models
we try to predict the family to extract something

New cards

Model Family example

Linear family would be y = C₀ + C₁x

New cards

Parameters (weights/coefficients)

constants used to specify a model in a model family

New cards

Parameters example

In model family : y = C₀ + C₁x

C₀ = 5, C₁ = 2 are the parameters

New cards

Model example

In model family : y = C₀ + C₁x

y = 5 + 1.75x

New cards

Hyperparameters

variables to specify options in a training alg

New cards

Training/learning/fitting

Finding parameters to specify a model

New cards

Error/Loss Function

Function chosen to evaluate the model
lower values = better

New cards

Training Set

Portion of data used for training vs testing

New cards

Generalization

How well does the model do on data it wasn’t trained on

New cards

Overfitting

Model is strict to the training data

New cards

A data point in the training set or test set

observation

New cards

An input value to the model

Observation

New cards

An output value of the model

Target

New cards

A “correct answer” to a data point in the training set or test set

Target

New cards

A R^D vector

Observation

New cards

When the function we're looking for is R^D->{-1, 1}.

Binary Classification

New cards

When the function we're looking for is R^D -> R.

Regression

New cards

When the function we're looking for is R^D -> Y for some finite set Y.

Classification

New cards

When the function we're looking for is R^D -> [0, 1]

Density Estimate

New cards

When we want to model a real-valued function.

Regression

New cards

When we want to model a probability distribution.

Density estimate

New cards

When we want to associate each observation with one of a finite set of labels or categories.

Classification

New cards

Variable

contains all values that measure the same attribute across units

New cards

Observation

contains all values measured on the same unit (row)

New cards

Attribute

A column/instance variable

New cards

Feature

value in a column, value of an instance variable

New cards

Value

element in the table, measurement, datum, feature

New cards

Tidy data

each var forms a column
each observation forms a row
each type of observational units forms a table

New cards

Feature Selection

The process of transforming the dataset by keeping only the most informative features

New cards

Curse of dimensionality

The phenomenon of the difficulty of training accurate models increasing with the dimensionality of the data

New cards

# of observations

New cards

# of attributes (dimensionality)

New cards

General - X

data set as NxD matrix

New cards

General - y

N length vector of target values

New cards

General - n

used as an index into the data set, for example X_n and ⃗y_n

New cards

General - i and j

used as indices into features, for example ⃗x_i where ⃗x = X_n

New cards

What is the idea behind K nearest neighbors?

With classification, its most likely items from the same class are next to one another → when classifying a new item look at the classes of its k nearest neighbors to determine

New cards

KNN algorithm

compute all distances between new data point and X
sort by the distances calculated
take the closest k distances
- use array bag to tally their classes
Return class w/ highest tally

New cards

KNN - Cost

Depends on size of D:

Distance computations - O(ND)

Sort distances - O(NlgN)

total : O(ND+NlgN) → O(NlgN)

New cards

What are the hyperparameters of KNN?

The number of neighbors
The distance metric

New cards

KNN - when k = 1

The model perfectly memorizes the data.
Highly sensitive to noise (one wrong neighbor ruins the classification).
Very low bias, high variance → Overfitting.

New cards

KNN - when k = N

Every query is classified based on the majority class in the entire dataset.
Ignores local structure → Underfitting.

New cards

Why is KNN non-parametric

It does not assume a fixed form (parameters) for the decision boundary. It memorizes training data and bases predictions on local neighborhoods.

New cards

Metric or Distance Function

Any function d between two vectors where:

d(x,y) = d(y,x)
d(x,z) <= d(x,y) + d(y,z)
d(x,y) = 0 iff x=y

New cards

What are Norms?

they measure vector magnitude (size)
compute the distance between two points

New cards

Euclidian Distance

L₂ Norm
for general continuous distance
sqaure root sum of square values

New cards

Manhattan/City-block Distance

L₁norm
For general continuous data
Sum of absolute values

New cards

Norm

a mathematical function that measures the size or magnitude of a vector in a given space

New cards

KNN properties

it’s instance based
lazy, most work is put into classification
it’s non-parametric, it has no parameters

New cards

R or C: Attribute

Column

New cards

R or C: Covariate

Column

New cards

R or C: Data Point

Row

New cards

R or C: Sample

Row

New cards

R or C: Feature

Column

New cards

R or C: Observation

Row

New cards

R or C: Variable

Column

New cards

Main problem of Curse of Dimensionality

greater the dimensionality → sparser the data is within the vector space

New cards

Regression general form

Given data X (N observations in D dimensions) and N target values as ⃗y, find a function for predicting the values of new data points

New cards

Cost

var, function, formula that we want to minimize in an optimization problem

New cards

Error

difference between the computed value and the correct value

New cards

Loss

measures how well the model performs
interprets error

New cards

Risk

how well a model performs on all possible data

difficult since we don’t have all possible data
expected val of loss function applied to arbitrary data

New cards

Empirical Risk Minimization (ERM)

General strategy of finding model that minimizes loss on the training data

New cards

linear regression model family

y(x) = θ₀ + θ₁x

New cards

How can we make lin reg model family into lin algebra form?

Refer to the params as vectors:

θ^→=[θ₀, θ₁]
x^→=[1, x]

and multiply : θ^→ * x^→

New cards

What is linear about linear regression?

output is weighted sum of inputs
no multiplication between features
forms a straight line or hyperplane

New cards

Simple Linear Regression - Loss Function

SSE - Sum of Squared Errors

(this is for a single input x)

θ is the slope and intercept
subtract the estimated y value from the actual

New cards

Multiple Linear Regression

this models multiple input features
θ_n determines how much impact it’s corresponding x_n has
θ₀is the intercept still and we extend the x vector by one to match the length of the θ vector

<ul><li><p>this models multiple input features </p></li><li><p>θ<sub>n</sub> determines how much impact it’s corresponding x<sub>n</sub> has </p></li><li><p>θ<sub>0 </sub>is the intercept still and we extend the x vector by one to match the length of the θ vector</p></li></ul><p></p>

New cards

Multiple Linear Regression - Loss Function

Same as linear regression but on larger scale, subtracting predicted from the actual y and doing matrix multiplication to get the square

New cards

What is regularization?

This adds an extra term to a normal loss function which penalizes large values of the model parameters/weights (θ) making prediction more general → preventing overfitting

New cards

Ridge Regularization

essentially linear regression with an L2 penalty on the weights
α controls strength of penalization → larger α → smaller coefficients

<ul><li><p>essentially linear regression with an L2 penalty on the weights </p></li><li><p>α controls strength of penalization → larger α → smaller coefficients</p></li></ul><p></p>

New cards

Lasso Regularization

Uses the L1 penalty instead * α to determine strength
uses Feature Selection and eliminates any features that are irrelevant

<ul><li><p>Uses the L1 penalty instead * α to determine strength</p></li><li><p>uses <strong>Feature Selection </strong>and eliminates any features that are irrelevant </p></li></ul><p></p>

New cards

Closed Form Solution

Provides a direct formula for how to get the optimal parameters that minimizes loss without using things like gradient descent

New cards

Which versions of linear regression have a closed form solution?

linear regression
Ridge regression
XX Lasso (absolute val - non differentiable) XX

New cards

What is Gradient Descent

Minimizes a function, typically loss by adjusting the parameters to the steepest descent and can be used on functions that don’t have a closed form solution.

New cards

Gradient Descent General Outline