Studied by 3 people

0.0(0)

get a hint

hint

Everything until deep learning

1

**Probability**

Study of uncertainty and randomness, used to model and analyze uncertainty in data.

New cards

2

A form of regularization

Ridge regression

New cards

3

Rows on a confusion matrix

Correspond to what is predicted

New cards

4

Collumns on a confusion matrix

Correspond to the known truth

New cards

5

The sensitivity Metric equation

True positives divided by the sum of true positives and false negatives

New cards

6

The Specificity metric equation

True negatives divided by true negatives plus false positives

New cards

7

if sensitivity = 0,81 what does it mean

example: tells us that 81% of the people with heart disease were correctly identifies by the logistic regression model

New cards

8

If specificity = 0.85 what does it mean

It means that 85% of the people without heart disease were correctly identified

New cards

9

When a correlation matrix has more than 2 rows, how do we calculate the sensitivity

We sum the false negatives

New cards

10

What is the function of specificity and sensitivity:

It helps us to decide which machine learning method would be best for our data

New cards

11

Sensitivity

If correcty identifying positives is the most important thing to do, which one should i choose? Sensitivity or Specificity?

New cards

12

If correctly identifying negatives is the most important thing, which one should I choose? Sensitivity or specificity?

Specificity

New cards

13

ROC

Receiver operator Characteristic

New cards

14

Roc funtion

To provide a simple way to summarize all the information, instead of making several confusion matrix

New cards

15

The y axis, in ROC, is the same thing as

Sensitivity

New cards

16

The x axis, in ROC, is the same thing as

Specificity

New cards

17

True positive rate =

Sensitivity

New cards

18

False positive rate =

Specificity

New cards

19

In another words, ROC allows us to

Set the right threshold

New cards

20

When specificity and sensitivity are equal,

the diagonal line shows where True positive rate = False positive rate

New cards

21

The ROC summarizes…

All of the confusion matrices that each threshold produced

New cards

22

AUC

Area under the curve

New cards

23

AUC function

To compare one ROC curve to another

New cards

24

Precision equation

True positives / true positives + false positives

New cards

25

Precision

the proportion of positive results that were correctly classified

New cards

26

Precision is not affected by imbalance because

It does not include the number of true negatives

New cards

27

Example when imbalance occurs

When studying a rare disease. In this case, the study will contain many more people without the disease than with the disease

New cards

28

ROC Curves make it easy to

Identify the best threshold for making a decision

New cards

29

AUC curves make it easy to

to decide which categorization method is better

New cards

30

Entropy can also be used to

Build classification trees

New cards

31

Entropy is also the basis of

Mutual Information

New cards

32

Mutual Information

Quantifies the relationship between 2 things

New cards

33

Entropy is also the basis of

Relative entropy ( the kullback leibler distance) and Cross entropy

New cards

34

Entropy is used to

quantify similarities and differences

New cards

35

If the probability is low, the surprise is

high

New cards

36

If the probability is high, the surprise is

low

New cards

37

The entropy of the result of X is

The expected *surprise* everytime we try the data

New cards

38

Entropy IS

The expected value of the surprise

New cards

39

We can rewrite entropy using

The sigma notation

New cards

40

Equation for surprise

New cards

41

Equation for entropy

New cards

42

Entropy

Is the log for the inverse of the probability

New cards

43

R2 *R Squared does not work for

Binary data, yes or no

New cards

44

R squared works for

Continuous data

New cards

45

Mutual information is

A numeric value that gives us a sense of how closely related two variables are

New cards

46

Equation for mutual information

New cards

47

Joint probabilities

The probability of two things occuring at the same time

New cards

48

Marginal Probabiities

The opposite of joint probability, is the probability of one thing occuring

New cards

49

Least sqaures =

Linear regression

New cards

50

squaring ensures

That each term is positive

New cards

51

Sum of Squared Residuals

How well the line fits the data

New cards

52

Sum of Squared Residuals function

The residuals are the differences between the real data and the line, and we are summing the square of these values

New cards

53

The Sum of square residuals must be

as low as possible

New cards

54

First step when working with bias and variance

Split the data in 2 sets, one for training and one for testing

New cards

55

How do we find the optimal rotation for the line

We take the derivative of the function. The derivative tells us the slope of the function at every point

New cards

56

Least squares final line

Result of the final line, that minimizes the distance between it and the real data

New cards

57

The first thing you do in linear regression

Use least squares to fit a line to the data

New cards

58

The second thing you do in linear regression

calculate r squared

New cards

59

The third thing you do in linear regression

calculate a p value for R

New cards

60

Residual

The distance from the line to a data point

New cards

61

SS(Mean)

Sum of squares around the mean

New cards

62

SS(Fit)

Sum of squares around the least squares fit

New cards

63

New cards

64

Linear regression is also called:

Least squares

New cards

65

What is Bias

Inability for a machine learning method like linear regression to capture the true relationship

New cards

66

How do we calculate how the lines will fit the training set:

By calculating the sum of squares. We measure how far the dots are from the main line

New cards

67

How do we calculate how the lines will fit the testing set:

New cards

68

Overfit

When the line at the training set data fits well, but not it does not fit well on the testing set

New cards

69

Ideal algorithm

Low bias, accurate on the true relationship

New cards

70

Low variability

Producing consistent predictions across different datasets

New cards

71

Result of least squares determination value for the equation parameters

it minimizes The sum of the square residuals

New cards

72

Y= Y-intercept + slope X

Linear regression

New cards

73

Y = Y-intercept + slope x + slope z

Multiple regression

New cards

74

Equation for R2 *r squared*

R2 = ss(mean) - ss(fit)

ss(mean)

New cards

75

Goal of a t test

Compare means and see if they are significantly different from each other

New cards

76

Odds are NOT

Probabilities

New cards

77

ODDS are

the ration of something happening *ex. the team winning*

to something not happening, ex. *the team NOT winning*

New cards

78

Logit function

Log of the ration of the probabilities and formas the basis for logistic regression

New cards

79

log(odds)

Log of the odds

New cards

80

log odds use?

Log odds is useful to determine probabilitirs about win/lose, yes/no, or true/false

New cards

81

Odds ratio

ex>

New cards

82

Relationship between odds ration and the log(odds ratio)

They indicate a relationship between 2 things, ex *a relationship between the mutated gene and cancer, like weather or not having a mutated gene increases the odds of having cancer *

New cards

83

Tests used to determine p values for log (odds ratio)

Fisher`s exact test, chi square test and the wald test

New cards

84

Large r squared implies…

A large effect

New cards

85

Machine Learning

Using data to predict something

New cards

86

Example of continous data

Weight and age

New cards

87

Example of discrete data

Genotype and astrological sign

New cards

88

Which curve is better? the one with maximum likelihood or minimum?

Maximum likelihood

New cards

89

Type of regression used to asses what variables are useful for classifying samples

Logistic regression

New cards

90

Components of GLM - Generalized Linear Models

Logistic regression and Linear models

New cards

91

The slope indicates

the rate at which the probability of a particular event occurring changes as the independent variable changes.

New cards

92

Logit function

Log(p)

1-p * p is the middle line *

New cards

93

If the coefficient *estimate* in logistic regression is negative, the odds are

against, Ex *if you don't weigh anything, the odds are against you being obese*

New cards

94

if the coefficient *estimate* is positive, that means that

For every unit of x gaines, the odds of y increases by *number* on the coefficient

New cards

95

In logistic regression, by using the *z value*, how do we confirm that it is statistically significant?

Greater than 2. ex. *2.255* with a p-value less than 0.05 ex *0.0241*

New cards

96

What the difference between the coeeficitents used for linear models and logistic regression?

Is the exact same, except the coefficients are in terms of log odds

New cards

97

In logistic regression, what is the scale of the coefficients?

Log(odds)

New cards

98

How lines are fit in Linear regression?

by using least squares, measuring the residuals, the distances between the data and the line, and then squared them so that the negative value do not cancel out positive values

New cards

99

Line with the smallest sum of squared residuals is

The best line

New cards

100

Line with the biggest sum of squared residuals is

The worst line

New cards