11-Naive Bayes

Objective: To find P(Y=1 | X1, X2, ..., Xm)
Examples:
- P(Y=Default | Income < 30K, Education = College, ...)
- P(Y=Spam Email | X1 = "lottery", X2 = "win", ...)
Characteristics:
- Naïve Bayes models do not have trainable parameters like neural networks.
- It utilizes probability theory to determine the likelihood of the event of interest without iterative processes.

Chain Rule Formula: P(A, B) = P(B|A)P(A)
Example of Calculations:
- P(own a pet, female) can be calculated using P(female) and P(own a pet | female).
- Formula: P(own a pet, female) = P(female) x P(own a pet | female) = (14/26) x (8/14) = 8/26

Disease Pre-screening:
- Accuracy: 95% when the patient has the disease (P(positive|disease) = 0.95).
- Probability of Disease: P(disease) = 0.001 (1 in 1000).
Calculating Probability of Disease Given Positive Result:
- Formula: P(disease|positive) = P(positive|disease) * P(disease) / P(positive)
- Computation example leads to P(disease|positive) approximately = 0.0509 (1.87%).

Components:
- Prior Probability of Labels
- Posterior Probability of Predictions based on Evidence
Evidence: Observed features (attribute values)
Relationship between Likelihood and Prior
- P(Label | Data) = P(Data | Label) * P(Label) / P(Data)

Objective: Predict value of Y using m features
Apply Bayes’ Rule:
- P(Y=1 | X1 = a1, X2 = a2, ..., Xm = am)
- The denominator disregarded shows focus on maximizing the numerator.

Naïve Assumption: All features are conditionally independent given the class.
- Formula: P(X1 = a1 | Y = 1) * P(X2 = a2 | Y = 1) * ... * P(Y = 1)
Characteristics of Naïve Bayes:
- Classifies based on input feature probabilities
- Pros:
  - Simplicity
  - Performance in real-world applications
  - Handles multi-class classification effectively
- Cons:
  - Struggles with numerical features without adjustments
  - Strong independence assumption may not always hold