AJ

Class Notes - Naive Bayes and KNN Algorithm Discussion

Class Overview

  • Instructor: Ferdi Eruysal

  • Attendance: Students must sign an attendance sheet circulating in the class.

  • Reminder: Upcoming assignment due next week on Sunday.

    • Students are encouraged to start early, as it may take hours to complete.

Today's Class Agenda

  • Introduction to the Naive Bayes algorithm.

    • Explanation duration: 20-25 minutes.

  • Recap of the K-Nearest Neighbors (KNN) algorithm.

  • Important note on data normalization prior to using KNN.

  • In-class exercise: Work on a long dataset to build different KNN models.

    • Allocate 225 minutes for this exercise.

Team Projects

  • Students are expected to create their own groups for team projects (max 5 members).

  • Procedure to form teams:

    • Go to the "People" tab on the platform.

    • Select "Team Projects" and pick an empty group.

    • Note: Members must be in the same section (not mixing with students from different sections).

  • Any student not assigned a group will be randomly assigned after two weeks.

Probability Basics

  • Importance of understanding probability for the Naive Bayes algorithm.

  • Basic concepts of probability concerning dice:

    • A (Ex: Probability of rolling a 5 or 6).

    • B (Probability of rolling a 5).

    • C (Probability of rolling a 3).

    • Probabilities calculated as follows:

    • Probability of A: 2/6 = 1/3 for rolling either 5 or 6.

    • Probability of B: 1/6.

    • Probability of C: 1/6.

  • Application of probabilities: Predicting loan defaults based on specific conditions (e.g., age, income).

Conditional Probability

  • Definition: Probability of event B occurring given that event A has occurred.

  • Notation: P(B|A)

    • If A already happened, what is the probability of B happening?

    • Example: Given A (rolling a 5 or 6), find P(A|B).

  • Use of conditional probabilities in predictions.

    • Colored outcome example with coins illustrating how observing one outcome affects predictions about another.

Bayes Theorem Explanation

  • Formula representation: P(A|B) = (P(B|A) * P(A)) / P(B)

  • Bayes theorem aids in making predictions based on feature values from datasets.

Intuition Examples

  • Envelope Example:

    • Two envelopes: one with a dollar and one without.

    • Probability of picking the correct envelope before seeing its contents is 1/2 (50%).

    • Upon revealing information (picking one of the envelopes), the chances can change

    • Instance of how additional information can refine predictions in machine learning.

Spam Detection Using Naive Bayes

  • Example of filtering out spam emails:

    • Begin with a histogram of words from normal messages.

    • Calculate the probabilities for words like "dear" in the normal messages (e.g. P(dear|normal) = 0.47).

    • Do the same for spam messages.

  • Introduction of prior probabilities based on message classification (normal vs spam).

    • Normal messages prior probability: 0.67 based on training data.

    • Spam messages prior probability: 0.33 on similar grounds.

Naive Bayes Calculation Example

  • Spam score calculation for the message "Dear Friend":

    • Normal message score: 0.09 and spam score: 0.01.

    • The message classed as normal since the score for normal exceeds that of spam.

Tips for Marketing Students

  • Importance of word choice in emails to avoid spam filters.

  • Cautious about words like “money” as they may trigger spam detection algorithms.

K-Nearest Neighbors (KNN) Implementation

  • Emphasis on normalizing data before KNN implementations to prevent features with larger scales from dominating the distance calculations.

  • Visualization of data before and after normalization to illustrate its effect.

Conclusion

  • Reminder: KNN requires normalization to ensure all features contribute equally.

  • Upcoming tasks include completing the assignment and optimizing model parameters.

  • Encourage students to submit assignments early and practice with the material learned in class.