Comprehensive Bullet-Point Notes on Machine Learning Lecture
Introductory Remarks
Lecturer begins by checking students’ concerns about the assignment; postpones Q&A until after theory.
Emphasizes that the morning session covers theory, afternoon session will be hands-on programming.
Today’s focus: Machine Learning (ML) within the larger Artificial Intelligence (AI) timeline.
Historical Context
Artificial Intelligence (AI) era: ≈ 1950 – 1980s. - Goal: create programs that “behave like a brain.”
Machine Learning (ML) era: ≈ 1980s – 2010. - Foundation for modern data-analytics curriculum.
Deep Learning (DL) era: 2010 → present. - State-of-the-art, useful for “very big” data when traditional ML struggles.
Considered “the future.”
What Is Machine Learning?
Essence: use data to answer questions by learning patterns from past data/experience.
Human analogy: humans consult memory; machines consult stored data.
Core task = pattern extraction & prediction.
Works well with large data; may fail when data are too large → shift to DL.
Simple Prediction / Regression Example
Dataset: number of people vs. ice-cream cost. - (1 person, $5); (2, $10); (4, $20).
Derived ratio $5/1 = $10/2 = $20/4 = 5.
Model (slope) = 5. - Predict 3 people: $5 * 3 = 15 ($15).
Name: simple linear regression / prediction model.
Best-Fit Line & Error Minimisation
Real-world points rarely lie perfectly on a line.
Procedure to pick best line: 1. Draw candidate line.
For each data point, draw vertical deviation (error).
Sum absolute (or squared) errors.
Compare multiple candidate lines; choose line with minimum total error.
Fundamental ML viewpoint: learning = optimising parameters to minimise error.
Three Visual Analogies for Error Minimisation
Robot-to-Cake path: program robot with right/left/forward moves so distance → 0.
Hiker descending mountain: choose path where gradient/height continuously decreases.
Separating red & blue dots (classification line): rotate/shift line until misclassification count → 0.
Applications Highlighted
Facebook friend suggestions, Amazon/Flipkart product recommendations, Netflix/YouTube content suggestions, etc.—all rely on ML models reducing error in predictions.
Concept of Features
Features = measurable properties extracted from raw data.
Image example (face recognition): nose length, face width, eye size, hair colour, presence of spectacles, texture, ear length, lip width, etc.
Signal (1-D) example: mean, standard deviation, entropy, max, min, kurtosis.
Guideline: the richer & more discriminative the features, the more powerful the model.
From Continuous Data to Features
Raw signal: 23.6 s sampled at 100 Hz → 23.6 * 100 = 2360 discrete samples.
X-axis switches from time domain to sample index domain.
Instead of feeding 2360 values, compute representative features (e.g., mean, std, entropy, max, min) → dramatic dimensionality reduction.
Training–Testing Split
Typical example: 70 % of feature rows for training, 30 % for testing. - Split ratio can vary (e.g.
80-20, 60-40) depending on dataset size & task.
Analogy: - Training = parent shows child multiple versions of A & B.
Testing = exam with unseen shapes; accuracy = (correct answers / total questions) * 100.
Example accuracy: (3/4) * 100 = 75%.
Building a Training Spreadsheet
Example flower dataset with three classes (rose, sunflower, jasmine). - Columns: sepal-length, sepal-width, petal-length, petal-width, class-label.
Only features + labels are fed to ML algorithm, not raw images.
End-to-End ML Workflow (diagram verbally described)
Input features.
Train algorithm (on 70 %).
Feed unknown test features (30 %).
Model outputs prediction.
Evaluate performance (accuracy, error metrics).
Three Categories of Machine Learning
1. Supervised Learning
Data come with class labels (target outputs).
Examples: coin weight → country, flower features → species.
Algorithms discussed in course: - Linear Regression
Logistic Regression
Support Vector Machine (SVM)
Naïve Bayes
Decision Tree / Random Forest
Applications: weather prediction, biometrics (fingerprint/iris), credit scoring, hospital readmission prediction, customer purchase likelihood.
2. Unsupervised Learning
No labels; algorithm must discover structure (clusters, components).
Concept demo: English vs. Chinese exam scores automatically form two clusters (English-strong, Chinese-strong).
Algorithms highlighted: - K-Means clustering
Principal Component Analysis (PCA)
Hidden Markov Model (HMM—for speech).
Applications: item identification & grouping, customer segmentation, medical imaging (e.g., MRI tumour detection, chest-X-ray nodule finding), recommendation systems.
3. Reinforcement Learning
Model interacts with environment, receives feedback/reward, learns optimal actions.
Example: vision system mislabels apple as mango → feedback “wrong”, updates policy.
Used in robotics, game playing (e.g., chess/Go), self-driving cars.
Summary of Algorithms Mentioned
Classification (Supervised): Naïve Bayes, SVM, Decision Tree, Random Forest.
Regression (Supervised): Linear Regression, Logistic Regression.
Clustering (Unsupervised): K-Means, HMM.
Dimensionality Reduction (Unsupervised): PCA, ICA.
Reinforcement: Q-Learning, Policy Gradients (not deeply covered).
Common Limitations & Challenges
Requires large, high-quality datasets; small or noisy data hurt performance.
Feature extraction is often the bottleneck; poor features → weak models. - Must sometimes add texture, shape or higher-order statistics to resolve overlapping classes.
Computation & memory cost increase with data volume and feature dimensionality.
Manual feature design is domain-expert intensive; motivates shift toward Deep Learning (automatic feature learning).
Q&A Highlights & Clarifications
Image vs. Feature table: feeding raw images is possible (esp. with DL), but for classic ML we extract numeric descriptors to cut computation. More features → better separation, but watch for redundancy & overfitting.
Continuous model improvement: after deployment, newly labelled cases (e.g., 10 001st MRI scan) can be appended to training set, creating a continually learning system.
Sampling Frequency: Hz = samples per second; 100 Hz ⇒ 100 discrete points each second.
Accuracy formula reiterated: Accuracy = (Correct / Total) * 100.
Supervised necessity: Training/testing split is essential when labels exist; otherwise unsupervised methods apply.
Planned Practical Sessions (Afternoon)
Hands-on coding for: - Decision Tree & Random Forest
Linear Regression demo
Possible SVM example
Further deep dive into feature extraction, PCA, K-Means in subsequent lectures.
Key Take-Away Points
ML = learning parameters that minimise error on data.
Quality & quantity of features dictate success; extraction is non-trivial.
Understand differences: Supervised (labels), Unsupervised (clusters), Reinforcement (interaction).
Always keep distinct training vs. testing sets for unbiased evaluation.
Deep Learning eclipses ML when feature extraction & scale become unmanageable.