Everything until deep learning
Probability
Study of uncertainty and randomness, used to model and analyze uncertainty in data.
A form of regularization
Ridge regression
Rows on a confusion matrix
Correspond to what is predicted
Collumns on a confusion matrix
Correspond to the known truth
The sensitivity Metric equation
True positives divided by the sum of true positives and false negatives
The Specificity metric equation
True negatives divided by true negatives plus false positives
if sensitivity = 0,81 what does it mean
example: tells us that 81% of the people with heart disease were correctly identifies by the logistic regression model
If specificity = 0.85 what does it mean
It means that 85% of the people without heart disease were correctly identified
When a correlation matrix has more than 2 rows, how do we calculate the sensitivity
We sum the false negatives
What is the function of specificity and sensitivity:
It helps us to decide which machine learning method would be best for our data
Sensitivity
If correcty identifying positives is the most important thing to do, which one should i choose? Sensitivity or Specificity?
If correctly identifying negatives is the most important thing, which one should I choose? Sensitivity or specificity?
Specificity
ROC
Receiver operator Characteristic
Roc funtion
To provide a simple way to summarize all the information, instead of making several confusion matrix
The y axis, in ROC, is the same thing as
Sensitivity
The x axis, in ROC, is the same thing as
Specificity
True positive rate =
Sensitivity
False positive rate =
Specificity
In another words, ROC allows us to
Set the right threshold
When specificity and sensitivity are equal,
the diagonal line shows where True positive rate = False positive rate
The ROC summarizes…
All of the confusion matrices that each threshold produced
AUC
Area under the curve
AUC function
To compare one ROC curve to another
Precision equation
True positives / true positives + false positives
Precision
the proportion of positive results that were correctly classified
Precision is not affected by imbalance because
It does not include the number of true negatives
Example when imbalance occurs
When studying a rare disease. In this case, the study will contain many more people without the disease than with the disease
ROC Curves make it easy to
Identify the best threshold for making a decision
AUC curves make it easy to
to decide which categorization method is better
Entropy can also be used to
Build classification trees
Entropy is also the basis of
Mutual Information
Mutual Information
Quantifies the relationship between 2 things
Entropy is also the basis of
Relative entropy ( the kullback leibler distance) and Cross entropy
Entropy is used to
quantify similarities and differences
If the probability is low, the surprise is
high
If the probability is high, the surprise is
low
The entropy of the result of X is
The expected surprise everytime we try the data
Entropy IS
The expected value of the surprise
We can rewrite entropy using
The sigma notation
Equation for surprise
Equation for entropy
Entropy
Is the log for the inverse of the probability
R2 *R Squared does not work for
Binary data, yes or no
R squared works for
Continuous data
Mutual information is
A numeric value that gives us a sense of how closely related two variables are
Equation for mutual information
Joint probabilities
The probability of two things occuring at the same time
Marginal Probabiities
The opposite of joint probability, is the probability of one thing occuring
Least sqaures =
Linear regression
squaring ensures
That each term is positive
Sum of Squared Residuals
How well the line fits the data
Sum of Squared Residuals function
The residuals are the differences between the real data and the line, and we are summing the square of these values
The Sum of square residuals must be
as low as possible
First step when working with bias and variance
Split the data in 2 sets, one for training and one for testing
How do we find the optimal rotation for the line
We take the derivative of the function. The derivative tells us the slope of the function at every point
Least squares final line
Result of the final line, that minimizes the distance between it and the real data
The first thing you do in linear regression
Use least squares to fit a line to the data
The second thing you do in linear regression
calculate r squared
The third thing you do in linear regression
calculate a p value for R
Residual
The distance from the line to a data point
SS(Mean)
Sum of squares around the mean
SS(Fit)
Sum of squares around the least squares fit
Linear regression is also called:
Least squares
What is Bias
Inability for a machine learning method like linear regression to capture the true relationship
How do we calculate how the lines will fit the training set:
By calculating the sum of squares. We measure how far the dots are from the main line
How do we calculate how the lines will fit the testing set:
Overfit
When the line at the training set data fits well, but not it does not fit well on the testing set
Ideal algorithm
Low bias, accurate on the true relationship
Low variability
Producing consistent predictions across different datasets
Result of least squares determination value for the equation parameters
it minimizes The sum of the square residuals
Y= Y-intercept + slope X
Linear regression
Y = Y-intercept + slope x + slope z
Multiple regression
Equation for R2 r squared
R2 = ss(mean) - ss(fit)
ss(mean)
Goal of a t test
Compare means and see if they are significantly different from each other
Odds are NOT
Probabilities
ODDS are
the ration of something happening ex. the team winning
to something not happening, ex. the team NOT winning
Logit function
Log of the ration of the probabilities and formas the basis for logistic regression
log(odds)
Log of the odds
log odds use?
Log odds is useful to determine probabilitirs about win/lose, yes/no, or true/false
Odds ratio
ex>
Relationship between odds ration and the log(odds ratio)
They indicate a relationship between 2 things, ex a relationship between the mutated gene and cancer, like weather or not having a mutated gene increases the odds of having cancer
Tests used to determine p values for log (odds ratio)
Fisher`s exact test, chi square test and the wald test
Large r squared implies…
A large effect
Machine Learning
Using data to predict something
Example of continous data
Weight and age
Example of discrete data
Genotype and astrological sign
Which curve is better? the one with maximum likelihood or minimum?
Maximum likelihood
Type of regression used to asses what variables are useful for classifying samples
Logistic regression
Components of GLM - Generalized Linear Models
Logistic regression and Linear models
The slope indicates
the rate at which the probability of a particular event occurring changes as the independent variable changes.
Logit function
Log(p)
1-p p is the middle line
If the coefficient estimate in logistic regression is negative, the odds are
against, Ex if you don't weigh anything, the odds are against you being obese
if the coefficient estimate is positive, that means that
For every unit of x gaines, the odds of y increases by number on the coefficient
In logistic regression, by using the z value, how do we confirm that it is statistically significant?
Greater than 2. ex. 2.255 with a p-value less than 0.05 ex 0.0241
What the difference between the coeeficitents used for linear models and logistic regression?
Is the exact same, except the coefficients are in terms of log odds
In logistic regression, what is the scale of the coefficients?
Log(odds)
How lines are fit in Linear regression?
by using least squares, measuring the residuals, the distances between the data and the line, and then squared them so that the negative value do not cancel out positive values
Line with the smallest sum of squared residuals is
The best line
Line with the biggest sum of squared residuals is
The worst line