Meru University Machine Learning Exam Notes

Meru University Exam Notes 2021/2022

Machine Learning Strategies

  • Supervised Learning: Learning from labeled data.

  • Unsupervised Learning: Learning from unlabeled data, finding patterns.

  • Reinforcement Learning: Learning through trial and error, receiving rewards or penalties.

Steps in Designing a Learning System

  1. Define the problem.

  2. Gather and prepare data.

  3. Select a learning algorithm.

  4. Train the model.

  5. Evaluate and tune the model.

  6. Deploy the model.

Application Areas of Machine Learning

  1. Healthcare: Disease prediction and diagnostics.

  2. Finance: Fraud detection and credit scoring.

  3. Retail: Customer segmentation and recommendation systems.

Decision Tree Algorithms

  • Basic idea: A tree structure where internal nodes represent tests on attributes, branches represent outcomes, and leaf nodes represent final decisions based on the majority class.

Bank System and Requirements

  • Before Deployment:

    • Ample historical data for model training.

  • Problems:

    1. Data privacy and compliance issues.

    2. Data bias leading to flawed predictions.

Machine Learning in Computer Vision

  • Example: Facial recognition involves identifying human faces in images using patterns learned from data.

/

Bias vs. Variance

  • Bias: Error due to overly simplistic assumptions in the learning algorithm.

  • Variance: Error due to excessive complexity in the learning model leading to overfitting.

Dimensionality Reduction

  • Definition: The process of reducing the number of input variables in a dataset.

  • Methods:

    1. Principal Component Analysis (PCA).

    2. Singular Value Decomposition (SVD).

    3. t-Distributed Stochastic Neighbor Embedding (t-SNE).

Find-S Algorithm

  • A method used in concept learning to find the most specific hypothesis that satisfies all positive training examples and no negative examples.

Decision Tree Attributes for Tennis Game Prediction

  • Entropy Calculation: Used to select attributes by measuring information gain.

  • Tree Pruning: A technique to reduce the complexity of a decision tree by removing branches that have little importance.

    • Methods: Cost complexity pruning and reduced error pruning.

Candidate Elimination Algorithm

  • Derives the version space based on training instances to identify the most specific and general hypotheses.

Overfitting in Machine Learning

  • Definition: A model performs well on training data but poorly on unseen data.

  • Causes: Complexity of the model, noise in training data, insufficient training data.

Cross Validation vs. Hyper-Parameter Optimization

  • Cross Validation: Technique for assessing how the results of a statistical analysis will generalize to an independent dataset.

  • Hyper-Parameter Optimization: The process of tuning the parameters of the learning algorithm to improve performance.

Deep Learning Concept

  • A subset of machine learning involving neural networks with many layers, designed to automatically learn representations from data.

Distance Measures in K-Neighbors

  • Measures include Euclidean, Manhattan, Minkowski, and Hamming distance used to determine closest neighbors in datasets.

Bayes Theorem in AI

  • Definition: A mathematical formula that describes how to update the probabilities of hypotheses given new evidence.

  • Application: Used in probabilistic models such as Naive Bayes classification in machine learning to make predictions based on prior knowledge and observed data.