11-Naive Bayes

Classification Model

  • Objective: To find P(Y=1 | X1, X2, ..., Xm)

  • Examples:

    • P(Y=Default | Income < 30K, Education = College, ...)

    • P(Y=Spam Email | X1 = "lottery", X2 = "win", ...)

  • Characteristics:

    • Naïve Bayes models do not have trainable parameters like neural networks.

    • It utilizes probability theory to determine the likelihood of the event of interest without iterative processes.

Probability and Conditional Probability

  • Definitions:

    • Probability P(A): Likelihood of event A occurring

      • Example: 23% of days are rainy, P(rain) = 0.23

    • Joint Probability P(A, B): Likelihood of events A and B occurring together

    • Conditional Probability P(A|B): Likelihood of event A given event B.

      • Example: P(rain|cloudy) = 0.62, cloudy days have 62% chance of rain.

Example of Probability Calculation

  • Survey of 26 Persons:

    • Own a Pet Distribution:

      • Female: 8 own, 6 don’t

      • Male: 5 own, 7 don’t

    • Calculations:

      • P(own a pet) = 13/26 = 0.50

      • P(own a pet | female) = 8/14 = 0.57

Probability Chain Rule

  • Chain Rule Formula: P(A, B) = P(B|A)P(A)

  • Example of Calculations:

    • P(own a pet, female) can be calculated using P(female) and P(own a pet | female).

    • Formula: P(own a pet, female) = P(female) x P(own a pet | female) = (14/26) x (8/14) = 8/26

Bayes’ Theorem

  • Formula:

    • From chain rule, P(A, B) = P(B|A)P(A) = P(A|B)P(B)

  • Also known as Bayes’ Rule

Bayes’ Theorem Example

  • Disease Pre-screening:

    • Accuracy: 95% when the patient has the disease (P(positive|disease) = 0.95).

    • Probability of Disease: P(disease) = 0.001 (1 in 1000).

  • Calculating Probability of Disease Given Positive Result:

    • Formula: P(disease|positive) = P(positive|disease) * P(disease) / P(positive)

    • Computation example leads to P(disease|positive) approximately = 0.0509 (1.87%).

Use of Bayes’ Theorem in Machine Learning

  • Components:

    • Prior Probability of Labels

    • Posterior Probability of Predictions based on Evidence

  • Evidence: Observed features (attribute values)

  • Relationship between Likelihood and Prior

    • P(Label | Data) = P(Data | Label) * P(Label) / P(Data)

Bayesian Classification

  • Objective: Predict value of Y using m features

  • Apply Bayes’ Rule:

    • P(Y=1 | X1 = a1, X2 = a2, ..., Xm = am)

    • The denominator disregarded shows focus on maximizing the numerator.

Naïve Bayes Model

  • Naïve Assumption: All features are conditionally independent given the class.

    • Formula: P(X1 = a1 | Y = 1) * P(X2 = a2 | Y = 1) * ... * P(Y = 1)

  • Characteristics of Naïve Bayes:

    • Classifies based on input feature probabilities

    • Pros:

      • Simplicity

      • Performance in real-world applications

      • Handles multi-class classification effectively

    • Cons:

      • Struggles with numerical features without adjustments

      • Strong independence assumption may not always hold

Learning Objective

  • Understand the Naive Bayes model thoroughly