Objective: To find P(Y=1 | X1, X2, ..., Xm)
Examples:
P(Y=Default | Income < 30K, Education = College, ...)
P(Y=Spam Email | X1 = "lottery", X2 = "win", ...)
Characteristics:
Naïve Bayes models do not have trainable parameters like neural networks.
It utilizes probability theory to determine the likelihood of the event of interest without iterative processes.
Definitions:
Probability P(A): Likelihood of event A occurring
Example: 23% of days are rainy, P(rain) = 0.23
Joint Probability P(A, B): Likelihood of events A and B occurring together
Conditional Probability P(A|B): Likelihood of event A given event B.
Example: P(rain|cloudy) = 0.62, cloudy days have 62% chance of rain.
Survey of 26 Persons:
Own a Pet Distribution:
Female: 8 own, 6 don’t
Male: 5 own, 7 don’t
Calculations:
P(own a pet) = 13/26 = 0.50
P(own a pet | female) = 8/14 = 0.57
Chain Rule Formula: P(A, B) = P(B|A)P(A)
Example of Calculations:
P(own a pet, female) can be calculated using P(female) and P(own a pet | female).
Formula: P(own a pet, female) = P(female) x P(own a pet | female) = (14/26) x (8/14) = 8/26
Formula:
From chain rule, P(A, B) = P(B|A)P(A) = P(A|B)P(B)
Also known as Bayes’ Rule
Disease Pre-screening:
Accuracy: 95% when the patient has the disease (P(positive|disease) = 0.95).
Probability of Disease: P(disease) = 0.001 (1 in 1000).
Calculating Probability of Disease Given Positive Result:
Formula: P(disease|positive) = P(positive|disease) * P(disease) / P(positive)
Computation example leads to P(disease|positive) approximately = 0.0509 (1.87%).
Components:
Prior Probability of Labels
Posterior Probability of Predictions based on Evidence
Evidence: Observed features (attribute values)
Relationship between Likelihood and Prior
P(Label | Data) = P(Data | Label) * P(Label) / P(Data)
Objective: Predict value of Y using m features
Apply Bayes’ Rule:
P(Y=1 | X1 = a1, X2 = a2, ..., Xm = am)
The denominator disregarded shows focus on maximizing the numerator.
Naïve Assumption: All features are conditionally independent given the class.
Formula: P(X1 = a1 | Y = 1) * P(X2 = a2 | Y = 1) * ... * P(Y = 1)
Characteristics of Naïve Bayes:
Classifies based on input feature probabilities
Pros:
Simplicity
Performance in real-world applications
Handles multi-class classification effectively
Cons:
Struggles with numerical features without adjustments
Strong independence assumption may not always hold
Understand the Naive Bayes model thoroughly