11-Naive Bayes
Classification Model
Objective: To find P(Y=1 | X1, X2, ..., Xm)
Examples:
P(Y=Default | Income < 30K, Education = College, ...)
P(Y=Spam Email | X1 = "lottery", X2 = "win", ...)
Characteristics:
Naïve Bayes models do not have trainable parameters like neural networks.
It utilizes probability theory to determine the likelihood of the event of interest without iterative processes.
Probability and Conditional Probability
Definitions:
Probability P(A): Likelihood of event A occurring
Example: 23% of days are rainy, P(rain) = 0.23
Joint Probability P(A, B): Likelihood of events A and B occurring together
Conditional Probability P(A|B): Likelihood of event A given event B.
Example: P(rain|cloudy) = 0.62, cloudy days have 62% chance of rain.
Example of Probability Calculation
Survey of 26 Persons:
Own a Pet Distribution:
Female: 8 own, 6 don’t
Male: 5 own, 7 don’t
Calculations:
P(own a pet) = 13/26 = 0.50
P(own a pet | female) = 8/14 = 0.57
Probability Chain Rule
Chain Rule Formula: P(A, B) = P(B|A)P(A)
Example of Calculations:
P(own a pet, female) can be calculated using P(female) and P(own a pet | female).
Formula: P(own a pet, female) = P(female) x P(own a pet | female) = (14/26) x (8/14) = 8/26
Bayes’ Theorem
Formula:
From chain rule, P(A, B) = P(B|A)P(A) = P(A|B)P(B)
Also known as Bayes’ Rule
Bayes’ Theorem Example
Disease Pre-screening:
Accuracy: 95% when the patient has the disease (P(positive|disease) = 0.95).
Probability of Disease: P(disease) = 0.001 (1 in 1000).
Calculating Probability of Disease Given Positive Result:
Formula: P(disease|positive) = P(positive|disease) * P(disease) / P(positive)
Computation example leads to P(disease|positive) approximately = 0.0509 (1.87%).
Use of Bayes’ Theorem in Machine Learning
Components:
Prior Probability of Labels
Posterior Probability of Predictions based on Evidence
Evidence: Observed features (attribute values)
Relationship between Likelihood and Prior
P(Label | Data) = P(Data | Label) * P(Label) / P(Data)
Bayesian Classification
Objective: Predict value of Y using m features
Apply Bayes’ Rule:
P(Y=1 | X1 = a1, X2 = a2, ..., Xm = am)
The denominator disregarded shows focus on maximizing the numerator.
Naïve Bayes Model
Naïve Assumption: All features are conditionally independent given the class.
Formula: P(X1 = a1 | Y = 1) * P(X2 = a2 | Y = 1) * ... * P(Y = 1)
Characteristics of Naïve Bayes:
Classifies based on input feature probabilities
Pros:
Simplicity
Performance in real-world applications
Handles multi-class classification effectively
Cons:
Struggles with numerical features without adjustments
Strong independence assumption may not always hold
Learning Objective
Understand the Naive Bayes model thoroughly