Comprehensive Bullet-Point Notes on Machine Learning Lecture

Lecturer begins by checking students’ concerns about the assignment; postpones Q&A until after theory.
Emphasizes that the morning session covers theory, afternoon session will be hands-on programming.
Today’s focus: Machine Learning (ML) within the larger Artificial Intelligence (AI) timeline.

Artificial Intelligence (AI) era: ≈ 1950 – 1980s. - Goal: create programs that “behave like a brain.”
Machine Learning (ML) era: ≈ 1980s – 2010. - Foundation for modern data-analytics curriculum.
Deep Learning (DL) era: 2010 → present. - State-of-the-art, useful for “very big” data when traditional ML struggles.
- Considered “the future.”

Essence: use data to answer questions by learning patterns from past data/experience.
Human analogy: humans consult memory; machines consult stored data.
Core task = pattern extraction & prediction.
Works well with large data; may fail when data are too large → shift to DL.

Dataset: number of people vs. ice-cream cost. - (1 person, $5); (2, $10); (4, $20).
- Derived ratio $5/1 = $10/2 = $20/4 = 5.
Model (slope) = 5. - Predict 3 people: $5 * 3 = 15 ($15).
Name: simple linear regression / prediction model.

Real-world points rarely lie perfectly on a line.
Procedure to pick best line: 1. Draw candidate line.
1. For each data point, draw vertical deviation (error).
2. Sum absolute (or squared) errors.
3. Compare multiple candidate lines; choose line with minimum total error.
Fundamental ML viewpoint: learning = optimising parameters to minimise error.

Robot-to-Cake path: program robot with right/left/forward moves so distance → 0.
Hiker descending mountain: choose path where gradient/height continuously decreases.
Separating red & blue dots (classification line): rotate/shift line until misclassification count → 0.

Facebook friend suggestions, Amazon/Flipkart product recommendations, Netflix/YouTube content suggestions, etc.—all rely on ML models reducing error in predictions.

Features = measurable properties extracted from raw data.
Image example (face recognition): nose length, face width, eye size, hair colour, presence of spectacles, texture, ear length, lip width, etc.
Signal (1-D) example: mean, standard deviation, entropy, max, min, kurtosis.
Guideline: the richer & more discriminative the features, the more powerful the model.

Raw signal: 23.6 s sampled at 100 Hz → 23.6 * 100 = 2360 discrete samples.
X-axis switches from time domain to sample index domain.
Instead of feeding 2360 values, compute representative features (e.g., mean, std, entropy, max, min) → dramatic dimensionality reduction.

Typical example: 70 % of feature rows for training, 30 % for testing. - Split ratio can vary (e.g.
80-20, 60-40) depending on dataset size & task.
Analogy: - Training = parent shows child multiple versions of A & B.
- Testing = exam with unseen shapes; accuracy = (correct answers / total questions) * 100.
- Example accuracy: (3/4) * 100 = 75%.

Example flower dataset with three classes (rose, sunflower, jasmine). - Columns: sepal-length, sepal-width, petal-length, petal-width, class-label.
- Only features + labels are fed to ML algorithm, not raw images.

Data come with class labels (target outputs).
Examples: coin weight → country, flower features → species.
Algorithms discussed in course: - Linear Regression
- Logistic Regression
- Support Vector Machine (SVM)
- Naïve Bayes
- Decision Tree / Random Forest
Applications: weather prediction, biometrics (fingerprint/iris), credit scoring, hospital readmission prediction, customer purchase likelihood.

No labels; algorithm must discover structure (clusters, components).
Concept demo: English vs. Chinese exam scores automatically form two clusters (English-strong, Chinese-strong).
Algorithms highlighted: - K-Means clustering
- Principal Component Analysis (PCA)
- Hidden Markov Model (HMM—for speech).
Applications: item identification & grouping, customer segmentation, medical imaging (e.g., MRI tumour detection, chest-X-ray nodule finding), recommendation systems.

Model interacts with environment, receives feedback/reward, learns optimal actions.
Example: vision system mislabels apple as mango → feedback “wrong”, updates policy.
Used in robotics, game playing (e.g., chess/Go), self-driving cars.

Classification (Supervised): Naïve Bayes, SVM, Decision Tree, Random Forest.
Regression (Supervised): Linear Regression, Logistic Regression.
Clustering (Unsupervised): K-Means, HMM.
Dimensionality Reduction (Unsupervised): PCA, ICA.
Reinforcement: Q-Learning, Policy Gradients (not deeply covered).

Requires large, high-quality datasets; small or noisy data hurt performance.
Feature extraction is often the bottleneck; poor features → weak models. - Must sometimes add texture, shape or higher-order statistics to resolve overlapping classes.
Computation & memory cost increase with data volume and feature dimensionality.
Manual feature design is domain-expert intensive; motivates shift toward Deep Learning (automatic feature learning).

Image vs. Feature table: feeding raw images is possible (esp. with DL), but for classic ML we extract numeric descriptors to cut computation. More features → better separation, but watch for redundancy & overfitting.
Continuous model improvement: after deployment, newly labelled cases (e.g., 10 001st MRI scan) can be appended to training set, creating a continually learning system.
Sampling Frequency: Hz = samples per second; 100 Hz ⇒ 100 discrete points each second.
Accuracy formula reiterated: Accuracy = (Correct / Total) * 100.
Supervised necessity: Training/testing split is essential when labels exist; otherwise unsupervised methods apply.

Hands-on coding for: - Decision Tree & Random Forest
- Linear Regression demo
- Possible SVM example
Further deep dive into feature extraction, PCA, K-Means in subsequent lectures.

ML = learning parameters that minimise error on data.
Quality & quantity of features dictate success; extraction is non-trivial.
Understand differences: Supervised (labels), Unsupervised (clusters), Reinforcement (interaction).
Always keep distinct training vs. testing sets for unbiased evaluation.
Deep Learning eclipses ML when feature extraction & scale become unmanageable.