1/37
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Naive Bayes
Data Driven, makes no assumption and created by Thomas Bayes
Naive Bayes Usage
Requires categorical variables, can be used for large data set
Exact Bayes Classifier
Relies on finding other records that share same predictor, finds probability, and needs exact match
Naive Bayes - solution to exact bayes
Assume independence of predictor variables, use multiplication rule, find same probability
Example of Naive Bayes
Financial Fraud
Exact Bayes Calculations Example
Classify a small firm if with charges filed
Naive bayes calculations
Classify a small firm with charged files with quantities
Other notes about Naive Bayes
Probability estimate does not differ from greatly from exact, all records are used in calculations, and is most practical
Independence Assumpition
Not strictly justified and often good enough
Advantages of Naive Bayes
Handles purely categorical data well, works with large set and is simple and efficient
Disadvantages of Naive Bayes
Requires large number of records. problematic when a predictor category is not present in training data
Assumptions of MLR
Linearity, Independence, Homoscedasticity, Normality, No multicollinearity, no auto-correlation
Linearity
Relationship between predictors and outcome is linear
Independence
Observations are independent of each other
Homoscedasticity
Constance variance of residuals
Normality
Residuals are normally distributed
No multicollinearity
Predictors aren’t highly correlated
No auto-correlation
Residuals are independent
When does multicollinearity occur
When two or more independent variables are highly correlated and can inflate standard error
Variance Inflation Factor
Quantifies how much the variance of a regression coefficient is inflated due to multicollinearity
VIF = 1
No multicollinearity and is ideaVI
VIF < 5
Low to moderate multicollinearity and generally acceptable
VIF > 5
High multicollinearity and is problematic
VIF > 10
Serve multicollinearity and strong evidence to remove or combine variable
How to detect violations
Linearity, Independence, Homoscedasticity, Normality, Multicollinearity, Auto-correlation
Lineartity
Residual plots, scatterplots
Independence
Study design, durbin-watson test
Homoscedasticity
Residual vs fitted plot
Normality
Q-Q plot
Multicollinearity
Variance Inflation Factor
Auto Correlation
Durbin-Watson Test
Durbin Watson test
Statistic range between 0 and 4 to test auto correlation
Durbin Watson Test with result of 2.0
no autocorrelation and is ideal case
Durbin Watson test of 0-2
Positive correlation
Durbin Test of 2.0-4
Negative correlation
Durbin Watson Test of 1.5-2.5
Generally acceptable
Durbin Watson Test of 1.5>x <2.5
Suggest potential autocorrelation problems
How to fix autocrrelation
Linearity, Independence, Homoscedasticity, Normality, Multicollinearity, Auto-correlation