Machine learning

0.0(0)

Studied by 3 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/397

Earn XP

Description and Tags

Everything until deep learning

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

398 Terms

New cards

Probability

Study of uncertainty and randomness, used to model and analyze uncertainty in data.

New cards

A form of regularization

Ridge regression

New cards

Rows on a confusion matrix

Correspond to what is predicted

New cards

Collumns on a confusion matrix

Correspond to the known truth

New cards

The sensitivity Metric equation

True positives divided by the sum of true positives and false negatives

New cards

The Specificity metric equation

True negatives divided by true negatives plus false positives

New cards

if sensitivity = 0,81 what does it mean

example: tells us that 81% of the people with heart disease were correctly identifies by the logistic regression model

New cards

If specificity = 0.85 what does it mean

It means that 85% of the people without heart disease were correctly identified

New cards

When a correlation matrix has more than 2 rows, how do we calculate the sensitivity

We sum the false negatives

New cards

What is the function of specificity and sensitivity:

It helps us to decide which machine learning method would be best for our data

New cards

Sensitivity

If correcty identifying positives is the most important thing to do, which one should i choose? Sensitivity or Specificity?

New cards

If correctly identifying negatives is the most important thing, which one should I choose? Sensitivity or specificity?

Specificity

New cards

ROC

Receiver operator Characteristic

New cards

Roc funtion

To provide a simple way to summarize all the information, instead of making several confusion matrix

New cards

The y axis, in ROC, is the same thing as

Sensitivity

New cards

The x axis, in ROC, is the same thing as

Specificity

New cards

True positive rate =

Sensitivity

New cards

False positive rate =

Specificity

New cards

In another words, ROC allows us to

Set the right threshold

New cards

When specificity and sensitivity are equal,

the diagonal line shows where True positive rate = False positive rate

New cards

The ROC summarizes…

All of the confusion matrices that each threshold produced

New cards

AUC

Area under the curve

New cards

AUC function

To compare one ROC curve to another

New cards

Precision equation

True positives / true positives + false positives

New cards

Precision

the proportion of positive results that were correctly classified

New cards

Precision is not affected by imbalance because

It does not include the number of true negatives

New cards

Example when imbalance occurs

When studying a rare disease. In this case, the study will contain many more people without the disease than with the disease

New cards

ROC Curves make it easy to

Identify the best threshold for making a decision

New cards

AUC curves make it easy to

to decide which categorization method is better

New cards

Entropy can also be used to

Build classification trees

New cards

Entropy is also the basis of

Mutual Information

New cards

Mutual Information

Quantifies the relationship between 2 things

New cards

Entropy is also the basis of

Relative entropy ( the kullback leibler distance) and Cross entropy

New cards

Entropy is used to

quantify similarities and differences

New cards

If the probability is low, the surprise is

high

New cards

If the probability is high, the surprise is

low

New cards

The entropy of the result of X is

The expected surprise everytime we try the data

New cards

Entropy IS

The expected value of the surprise

New cards

We can rewrite entropy using

The sigma notation

New cards

Equation for surprise

New cards

Equation for entropy

New cards

Entropy

Is the log for the inverse of the probability

New cards

R2 *R Squared does not work for

Binary data, yes or no

New cards

R squared works for

Continuous data

New cards

Mutual information is

A numeric value that gives us a sense of how closely related two variables are

New cards

Equation for mutual information

New cards

Joint probabilities

The probability of two things occuring at the same time

New cards

Marginal Probabiities

The opposite of joint probability, is the probability of one thing occuring

New cards

Least sqaures =

Linear regression

New cards

squaring ensures

That each term is positive

New cards

Sum of Squared Residuals

How well the line fits the data

New cards

Sum of Squared Residuals function

The residuals are the differences between the real data and the line, and we are summing the square of these values

New cards

The Sum of square residuals must be

as low as possible

New cards

First step when working with bias and variance

Split the data in 2 sets, one for training and one for testing

New cards

How do we find the optimal rotation for the line

We take the derivative of the function. The derivative tells us the slope of the function at every point

New cards

Least squares final line

Result of the final line, that minimizes the distance between it and the real data

New cards

The first thing you do in linear regression

Use least squares to fit a line to the data

New cards

The second thing you do in linear regression

calculate r squared

New cards

The third thing you do in linear regression

calculate a p value for R

New cards

Residual

The distance from the line to a data point

New cards

SS(Mean)

Sum of squares around the mean

New cards

SS(Fit)

Sum of squares around the least squares fit

New cards

Linear regression is also called:

Least squares

New cards

What is Bias

Inability for a machine learning method like linear regression to capture the true relationship

New cards

How do we calculate how the lines will fit the training set:

By calculating the sum of squares. We measure how far the dots are from the main line

New cards

How do we calculate how the lines will fit the testing set:

New cards

Overfit

When the line at the training set data fits well, but not it does not fit well on the testing set

New cards

Ideal algorithm

Low bias, accurate on the true relationship

New cards

Low variability

Producing consistent predictions across different datasets

New cards

Result of least squares determination value for the equation parameters

it minimizes The sum of the square residuals

New cards

Y= Y-intercept + slope X

Linear regression

New cards

Y = Y-intercept + slope x + slope z

Multiple regression

New cards

Equation for R2 r squared

R2 = ss(mean) - ss(fit)

ss(mean)

New cards

Goal of a t test

Compare means and see if they are significantly different from each other

New cards

Odds are NOT

Probabilities

New cards

ODDS are

the ration of something happening ex. the team winning

to something not happening, ex. the team NOT winning

New cards

Logit function

Log of the ration of the probabilities and formas the basis for logistic regression

New cards

log(odds)

Log of the odds

New cards

log odds use?

Log odds is useful to determine probabilitirs about win/lose, yes/no, or true/false

New cards

Odds ratio

ex>

New cards

Relationship between odds ration and the log(odds ratio)

They indicate a relationship between 2 things, ex a relationship between the mutated gene and cancer, like weather or not having a mutated gene increases the odds of having cancer

<p>They indicate a relationship between 2 things, ex <em>a relationship between the mutated gene and cancer, like weather or not having a mutated gene increases the odds of having cancer </em></p>

New cards

Tests used to determine p values for log (odds ratio)

Fisher`s exact test, chi square test and the wald test

New cards

Large r squared implies…

A large effect

New cards

Machine Learning

Using data to predict something

New cards

Example of continous data

Weight and age

New cards

Example of discrete data

Genotype and astrological sign

New cards

Which curve is better? the one with maximum likelihood or minimum?

Maximum likelihood

New cards

Type of regression used to asses what variables are useful for classifying samples

Logistic regression

New cards

Components of GLM - Generalized Linear Models

Logistic regression and Linear models

New cards

The slope indicates

the rate at which the probability of a particular event occurring changes as the independent variable changes.

New cards

Logit function

Log(p)

1-p p is the middle line

New cards

If the coefficient estimate in logistic regression is negative, the odds are

against, Ex if you don't weigh anything, the odds are against you being obese

New cards

if the coefficient estimate is positive, that means that

For every unit of x gaines, the odds of y increases by number on the coefficient

New cards

In logistic regression, by using the z value, how do we confirm that it is statistically significant?

Greater than 2. ex. 2.255 with a p-value less than 0.05 ex 0.0241

New cards

What the difference between the coeeficitents used for linear models and logistic regression?

Is the exact same, except the coefficients are in terms of log odds

New cards

In logistic regression, what is the scale of the coefficients?

Log(odds)

New cards

How lines are fit in Linear regression?

by using least squares, measuring the residuals, the distances between the data and the line, and then squared them so that the negative value do not cancel out positive values

New cards

Line with the smallest sum of squared residuals is

The best line

100

New cards

Line with the biggest sum of squared residuals is

The worst line