Meru University Machine Learning Exam Notes

Meru University Exam Notes 2021/2022

Machine Learning Strategies

Supervised Learning: Learning from labeled data.
Unsupervised Learning: Learning from unlabeled data, finding patterns.
Reinforcement Learning: Learning through trial and error, receiving rewards or penalties.

Steps in Designing a Learning System

Define the problem.
Gather and prepare data.
Select a learning algorithm.
Train the model.
Evaluate and tune the model.
Deploy the model.

Application Areas of Machine Learning

Healthcare: Disease prediction and diagnostics.
Finance: Fraud detection and credit scoring.
Retail: Customer segmentation and recommendation systems.

Decision Tree Algorithms

Basic idea: A tree structure where internal nodes represent tests on attributes, branches represent outcomes, and leaf nodes represent final decisions based on the majority class.

Bank System and Requirements

Before Deployment:
- Ample historical data for model training.
Problems:
1. Data privacy and compliance issues.
2. Data bias leading to flawed predictions.

Machine Learning in Computer Vision

Example: Facial recognition involves identifying human faces in images using patterns learned from data.

/

Bias vs. Variance

Bias: Error due to overly simplistic assumptions in the learning algorithm.
Variance: Error due to excessive complexity in the learning model leading to overfitting.

Dimensionality Reduction

Definition: The process of reducing the number of input variables in a dataset.
Methods:
1. Principal Component Analysis (PCA).
2. Singular Value Decomposition (SVD).
3. t-Distributed Stochastic Neighbor Embedding (t-SNE).

Find-S Algorithm

A method used in concept learning to find the most specific hypothesis that satisfies all positive training examples and no negative examples.

Decision Tree Attributes for Tennis Game Prediction

Entropy Calculation: Used to select attributes by measuring information gain.
Tree Pruning: A technique to reduce the complexity of a decision tree by removing branches that have little importance.
- Methods: Cost complexity pruning and reduced error pruning.

Candidate Elimination Algorithm

Derives the version space based on training instances to identify the most specific and general hypotheses.

Overfitting in Machine Learning

Definition: A model performs well on training data but poorly on unseen data.
Causes: Complexity of the model, noise in training data, insufficient training data.

Cross Validation vs. Hyper-Parameter Optimization

Cross Validation: Technique for assessing how the results of a statistical analysis will generalize to an independent dataset.
Hyper-Parameter Optimization: The process of tuning the parameters of the learning algorithm to improve performance.

Deep Learning Concept

A subset of machine learning involving neural networks with many layers, designed to automatically learn representations from data.

Distance Measures in K-Neighbors

Measures include Euclidean, Manhattan, Minkowski, and Hamming distance used to determine closest neighbors in datasets.

Bayes Theorem in AI

Definition: A mathematical formula that describes how to update the probabilities of hypotheses given new evidence.
Application: Used in probabilistic models such as Naive Bayes classification in machine learning to make predictions based on prior knowledge and observed data.