1/37
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
|---|
No study sessions yet.
Positive
Observation is positive
negative
observation is negative
True Postives
observation +, prediction +
False Postive
observation -, preditction +
True Negative
observation -, prediction -
false negative
observation +, predicition -
Accuracy
How close is predicitoin to the actual value
Accuracy Formula
(TN+TP)/N
Precision
How many selected values are relevant
Precision Formula
TP/(TP+FP)
Recall (Sensitivity)
How many relevant items are selected
Recall/Sensitivity/TPR Formula
TP/(TP+FN)
F1 Score Formula
2* ((recall*precision)/(recall+precision))
F1 Score
evaluates both precision and recall
Normalization
x-min(x)/max(x)-min(x)
Standardization
x-mean/Standard deviation
Overfit
When the model is too complex for the data given, resulting in reaching for correct values, think kNN but the dividing line is too specific
Underfit
when the model is too simple for the data given, resulting in undershooting the values. EX: using a linear model for non-linear data
Two ways of removing outliers
Removal, iterative Removal
Removal (Outliers)
Simply removing outliers
Iterative Removal (Outliers)
Removing and replacing outliers with values found from a model
2 ways of handling Missing Data
Deletion, imputation
deletion
Listwise, Pairwise, Variable Dropping
Imputation
LOCF, NOCB, interpolation, Extrapolation
Category Encoding
Giving categorical variables numeric values (Binary, Target-Based)
Binary Enconding
giving categorical variables a corresponding number (male 0, female 1)
Target Based Enconding
Binary encoding but finding the proportions
Supervised
Model is given a data set with labels
Unsupervised
Working with data without labels
Classification
Supervised Machine learning dealing with categorical value
Regression
Supervised Machine learning dealing with numerical value
algortithm
a methodical, logical rule or procedure that guarantees solving a particular problem. Contrasts with speedier heuristics
hyperparameter
the parameters that changes how the model deals with the normal parameters
FPR - False Positive Rate
FP/(FP+TN)
ROC
Graphs the different threshold points to see which threshold gives the most accurate data. y-axis: TPR, x-axis: FPR
AUC
area under the curve, used to determine which model's are more accurate. Whichever has the higher AUC value is more accurate
KNN
distance based classification to see which neighbors are closest, then out of the neighbors whichever is a majority one class is the class of the value
linear regression
a statistical method used to fit a linear model to a given data set