Magic - Supervised Learning_2024(1)

Data Analytics & Machine Learning Week 4: AI isn’t magic but it’s okay if it feels like it
Educators: Aimée Backiel, Kenric Borgelioen, Daan Nijs
Course: Data Analytics and Machine Learning
Program: BCS 2024 - 2025 (Toegepaste Informatica)

Introduction
Bringing the right equipment for data adventure
- Variable types and summary statistics
- Data Visualization
- Probability and Statistics
AI isn’t magic but it’s okay if it feels like it
- Data-driven decision making
- Supervised Learning: linear and logistic regression
Evaluating model quality: Good vs Bad models
- Model evaluation and interpretation
Cognitive processes in AI:
- Decision trees
- Neural networks
AI pattern recognition:
- Unsupervised learning
- Reinforcement learning
Review and Exam Preparation

AI isn’t magic but it’s okay if it feels like it
Machine Learning Paradigms
- Supervised Learning
- Classification
- Regression

Process where a model is trained on an input with labeled outputs
Types:
- Classification: Predicting discrete classes (e.g., yes/no decisions)
- Regression: Predicting continuous output values

Classification:
- K-Nearest Neighbors (KNN)
- Naïve Bayes
- Decision Tree
- Random Forest
- Logistic Regression
- Support Vector Machines
- Artificial Neural Network
Regression:
- Linear Regression
- Non-Linear Regression
- K-Nearest Neighbors Regression
- Decision Trees Regression
- Support Vector Regression
- Artificial Neural Network Regression

Lazy Learner: No training phase required, predictions based on distance from current instance
Predicting Class:
- K=1: Uses the closest neighbor
- K>1: Majority class among neighbors
Distance Metrics:
- Hamming Distance: Suitable for binary variables
- Euclidean Distance: Suitable for continuous variables
- Manhattan Distance: Suitable for grid-like structures
- Chebyshev Distance: Consideration of diagonal moves in predictions

Structure: Nodes and edges without loops
- Nodes: Root, Internal, Leaf
- Edges: Connections from parent to child nodes
Key Metrics for Splitting:
- Entropy: Homogeneity of data (0 means pure)
- Information Gain: Reduction in entropy after a split
- Gini Index: Measures likelihood of misclassification

Simple Linear Regression: Predicts Y based on a single variable X
- Equation: Y = β0 + β1*X + ε
Multiple Linear Regression: Predicts Y based on multiple variables X1, X2,..., Xp
- Equation: Y = β0 + β1·X1 + ... + βp·Xp + ε

Purpose: Simplify models and prevent overfitting by penalizing complexity
Methods:
- Lasso Regression: Reduces less significant coefficients to zero
- Ridge Regression: Shrinks coefficients without eliminating them

Next Week: What Makes Good Models Good and Bad Models Bad?
- Focus on Model Evaluation and Interpretation