Data Mining Quiz 4

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/37

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

38 Terms

1
New cards

Naive Bayes

Data Driven, makes no assumption and created by Thomas Bayes

2
New cards

Naive Bayes Usage

Requires categorical variables, can be used for large data set

3
New cards

Exact Bayes Classifier

Relies on finding other records that share same predictor, finds probability, and needs exact match

4
New cards

Naive Bayes - solution to exact bayes

Assume independence of predictor variables, use multiplication rule, find same probability

5
New cards

Example of Naive Bayes

Financial Fraud

6
New cards

Exact Bayes Calculations Example

Classify a small firm if with charges filed

7
New cards

Naive bayes calculations

Classify a small firm with charged files with quantities

8
New cards

Other notes about Naive Bayes

Probability estimate does not differ from greatly from exact, all records are used in calculations, and is most practical

9
New cards

Independence Assumpition

Not strictly justified and often good enough

10
New cards

Advantages of Naive Bayes

Handles purely categorical data well, works with large set and is simple and efficient

11
New cards

Disadvantages of Naive Bayes

Requires large number of records. problematic when a predictor category is not present in training data

12
New cards

Assumptions of MLR

Linearity, Independence, Homoscedasticity, Normality, No multicollinearity, no auto-correlation

13
New cards

Linearity

Relationship between predictors and outcome is linear

14
New cards

Independence

Observations are independent of each other

15
New cards

Homoscedasticity

Constance variance of residuals

16
New cards

Normality

Residuals are normally distributed

17
New cards

No multicollinearity

Predictors aren’t highly correlated

18
New cards

No auto-correlation

Residuals are independent

19
New cards

When does multicollinearity occur

When two or more independent variables are highly correlated and can inflate standard error

20
New cards

Variance Inflation Factor

Quantifies how much the variance of a regression coefficient is inflated due to multicollinearity

21
New cards

VIF = 1

No multicollinearity and is ideaVI

22
New cards

VIF < 5

Low to moderate multicollinearity and generally acceptable

23
New cards

VIF > 5

High multicollinearity and is problematic

24
New cards

VIF > 10

Serve multicollinearity and strong evidence to remove or combine variable

25
New cards

How to detect violations

Linearity, Independence, Homoscedasticity, Normality, Multicollinearity, Auto-correlation

26
New cards

Lineartity

Residual plots, scatterplots

27
New cards

Independence

Study design, durbin-watson test

28
New cards

Homoscedasticity

Residual vs fitted plot

29
New cards

Normality

Q-Q plot

30
New cards

Multicollinearity

Variance Inflation Factor

31
New cards

Auto Correlation

Durbin-Watson Test

32
New cards

Durbin Watson test

Statistic range between 0 and 4 to test auto correlation

33
New cards

Durbin Watson Test with result of 2.0

no autocorrelation and is ideal case

34
New cards

Durbin Watson test of 0-2

Positive correlation

35
New cards

Durbin Test of 2.0-4

Negative correlation

36
New cards

Durbin Watson Test of 1.5-2.5

Generally acceptable

37
New cards

Durbin Watson Test of 1.5>x <2.5

Suggest potential autocorrelation problems

38
New cards

How to fix autocrrelation

Linearity, Independence, Homoscedasticity, Normality, Multicollinearity, Auto-correlation