CI 460 Machine Learning

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/62

flashcard set

Earn XP

Description and Tags

ERAU

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

63 Terms

1
New cards

Irreducible Error

inherent noise or variability in the data that cannot be reduced.

2
New cards

Reducible Error

Error that can be reduced by improving the model

3
New cards

Prediction

estimation, we can make accurate predictions for the response

4
New cards

Inference

(inference) if we can understand how the output changes based on the input

5
New cards

Parametric Methods

Assume the functional form of F, fixed number of parameters. Simple with less flexibility. Examples: linear regression, logistic regression

6
New cards

Non-parametric methods

Learns directly from the data, adapts to complex patterns. Needs more training data and are more flexible. Examples: decision trees, random forests, KNN

7
New cards

MSE (Mean Squared Error)

Measure used to asses quality of fit/accuracy for regression model. Always want to minimize MSE.

8
New cards

MSE

Squared difference between actual and predicted values

  • n - number of observations

  • yi - observed values

  • ^yi - predicted values

<p>Squared difference between actual and predicted values</p><ul><li><p>n - number of observations</p></li><li><p>yi - observed values</p></li><li><p>^yi - predicted values</p></li></ul><p></p>
9
New cards

MSE example

  • True Y = [14, 7, 5, 9]

  • Predicted Y = [4, 8, 8, 10]

  • MSE = ( (14- 4)^2 + (7 - 8)^2 + (5 - 8)^2 + (9-10)^2 ) / 8

10
New cards

Error Rate

Measuring Quality of fit for classification. If the condition gives a 1 then the condition is incorrect, otherwise is a zero. Error rate represents a fraction of the incorrect classifications.

<p>Measuring Quality of fit for classification. If the condition gives a 1 then the condition is incorrect, otherwise is a zero. Error rate represents a fraction of the incorrect classifications.</p>
11
New cards

Loss Function

Measures how far the prediction is from the true outcome and assigns a numerical penalty to the incorrect predictions. Want to choose the weights that minimize the loss.

  • EX - MSE, Error Rate

12
New cards

Bias

(underfitting) error from modeling problems with too simple of the model. (Using a linear model to try and fit a complex pattern)

13
New cards

Variance

(Overfitting) How much the output (f) would change based on a different training data set. More flexible the model the more variance

14
New cards

Bias Variance Trade off

The more flexible the model the higher the variance and the less bias

15
New cards

Bayes Error Rate

Irreducible error in classification. It is the lowest error rate that any classifier can achieve.

16
New cards

KNN (K Nearest Neighbors)

Classification, Regression, uses the closest neighbors to decide classification, Non-parametric.

  • K = small value → High variance and low bias

  • In higher dimensions performs worse and fits to the noise

  • Is sensitive to scaling

17
New cards

Scaling

  • Scaling - making sure that no one feature will dominate the distance and outcome, pre processing technique

  • algorithms that have gradient decent need scaling

18
New cards

Normalization

when doesn’t linear conclusion

<p></p><p>when doesn’t linear conclusion</p>
19
New cards

Standardization

when data follows a normal distribution

<p>when data follows a normal distribution</p>
20
New cards

Peaking Phenomenon

As features go up so does performance until optimal and then goes down as more and more features are added.

21
New cards

Curse of Dimensionality

When the dimension is so high that all the points are basically the same distance apart and the center of the cube is empty. Need to reduce features

22
New cards

Binary Classifier Evaluation

knowt flashcard image
23
New cards

Accuracy

knowt flashcard image
24
New cards

Sensitivity

True Positive Rate

TP/ ( TP +FN )

25
New cards

Specificity

Specificity is the proportion of actual negatives that got predicted

True Negative Rate

TN / ( TN + FP)

26
New cards

1 - Specificity

False Positive Rate

FP / ( FP + TN)

27
New cards

Logit Output

Probability Output

P( y=1 | x)

28
New cards

OLS (Ordinary Least Squared)

Minimizes the squared differences between actual and predicted values

Uses MSE and the Residuals

29
New cards

Normal Equations

  • Closed form solution

  • Minimize MSE

  • For simple linear regression, smaller datasets, and for model parameters

<ul><li><p>Closed form solution</p></li><li><p>Minimize MSE</p></li><li><p>For simple linear regression, smaller datasets, and for model parameters</p></li></ul><p></p>
30
New cards

Gradient Decent

  • Efficient for large datasets and high dimensions

  • scales well in machine learning

  • iterative optimization algorithm to minimize the loss function

31
New cards

Normal Equations

  • Closed form solution

  • efficient for smaller datasets with few features

32
New cards

Precision

TP / ( TP + FP )

33
New cards

Information Theory

Study of qualification, storage, and digital information

  • Founded by Harry Nyquist, Ralph Hartley, Claude Shannon

34
New cards

Quantifying Information

  • Low probability event → high information (surprising)

  • High probability event → low information

35
New cards

Self Information

The measure of the information associated with the outcome of a random variable, (How much information is in a given variable)

  • 1 unit of information = 1 bit

  • I(x) = -log(P(x))

<p>The measure of the information associated with the outcome of a random variable, (How much information is in a given variable) </p><ul><li><p>1 unit of information = 1 bit</p></li><li><p>I(x) = -log(P(x))</p></li></ul><p></p>
36
New cards

Expected Value

The long term average

<p>The long term average </p>
37
New cards

Entropy

The measure of the average uncertainty associated with a given random variable

38
New cards

Self Information vs Entropy

  • Self Information → measures the information/uncertainty in an event

  • Entropy → the amount of uncertainty/information in a set of events

39
New cards

Linear Regression

  • Models the relationship between the dependent and independent variable by using line of best fit

<ul><li><p>Models the relationship between the dependent and independent variable by using line of best fit</p></li></ul><p></p>
40
New cards

Gradient decent is efficient for

large datasets and high dimensions, scales well

41
New cards

least squares fit in terns of linear regression

is used to minimize the differences between the predicted and actual values.

42
New cards

normal equations are good for

small datasets and small features, involve inverting a matrix

43
New cards

Regression sum of squares

SST = SSR + SSE

  • SSTotal - total squared distance of observations of mean of y

  • SSRegression - distance from regression line to mean of y

  • SSRedidual (SSE) - variance around the regression line that is not explained by the regression line (this is what we are trying to minimize)

44
New cards

Coefficient of Determination (R²)

  • values range from 0 - 1

  • 0 - model does not explain any variability in the data

  • 1 - the model perfectly explains the variability in the data

  • R² = SSR/SST

  • The Higher the R² the better fit of the model to the data

45
New cards

Logistic Regression

for classification problems

instead of predicting y we predict P(Y=1) (aka yes or no)

<p>for classification problems</p><p>instead of predicting y we predict P(Y=1) (aka yes or no)</p>
46
New cards

Logistic regression gives an

s shaped curve

47
New cards

odds ratio

measure that gives the change in the odds of an outcome if everything else is constant

  • values close to 0 indicate very low or very high probabilites

48
New cards

Logit

knowt flashcard image
49
New cards

ROC vs AUC

  • ROC measures the classifier performance over all thresholds

  • AUC (area under the roc curve) - single numbered score that summarizes the performance

50
New cards

Regression Trees

  • Regression problems

  • divide predictor space into distinct regions

51
New cards

Split regression trees by using

the results with the lowest mse or entropy in the training data

52
New cards

Classification tree

  • predictions are categorical not continuous (regression)

  • split by minimizing the error rate

53
New cards

we know how far back to prune a tree by

using cross validation to see which tree has the lowest error rate

54
New cards

CP

Complexity parameter, use to control complexity and accuracy trade off

55
New cards

Pros vs Cons of decision trees

  • Pro - Trees are easy to explain, plotted graphically, used for both classification and regression

  • Cons - Don’t have prediction accuracy as more complicated models

56
New cards

Bagging

  • bootstrap aggregating

    • resampling of the observed dataset done by random sampling with replacement from the og dataset

    • averaging reduces variance

    • gives lots of training datasets

    • less interpretation

57
New cards

In random forests variable importance is computed

Through mean decrease gini

58
New cards

Mean Decrease Gini vs Entropy

  • Both used to evaluate quality of split

  • Gini - Based on probability of missclassification

  • Entropy - based on the amount of information needed to identify the class of an element

59
New cards

Random Forests

  • Same process as bagging but de-correlates the trees

  • Done by taking a random sample of predictors every time a split is considered

60
New cards

When random forests are built using m = p

This amounts to being a bagged tree

61
New cards

Validation Set Approach

  • Split data into training and testing (validation)

  • build multiple models and test on lowest error rate

  • can be highly variable

62
New cards

Leave - one - out cross validation

  • split data of size n into

    • training - n-1

    • test (validation) 1

  • validate and find mse

  • Has less bias, variability, computationally intensive

63
New cards

K - fold cross validation

  • divide dataset into parts

  • remove one part and test on last part

  • repeat with different part each time

  • average k different mse’s

  • more accurate, balances bias/variance trade off for error estimates