JB

Comprehensive Bullet-Point Notes on Machine Learning Lecture

Introductory Remarks
  • Lecturer begins by checking students’ concerns about the assignment; postpones Q&A until after theory.

  • Emphasizes that the morning session covers theory, afternoon session will be hands-on programming.

  • Today’s focus: Machine Learning (ML) within the larger Artificial Intelligence (AI) timeline.

Historical Context
  • Artificial Intelligence (AI) era: ≈ 1950 – 1980s. - Goal: create programs that “behave like a brain.”

  • Machine Learning (ML) era: ≈ 1980s – 2010. - Foundation for modern data-analytics curriculum.

  • Deep Learning (DL) era: 2010 → present. - State-of-the-art, useful for “very big” data when traditional ML struggles.

    • Considered “the future.”

What Is Machine Learning?
  • Essence: use data to answer questions by learning patterns from past data/experience.

  • Human analogy: humans consult memory; machines consult stored data.

  • Core task = pattern extraction & prediction.

  • Works well with large data; may fail when data are too large → shift to DL.

Simple Prediction / Regression Example
  • Dataset: number of people vs. ice-cream cost. - (1 person, $5); (2, $10); (4, $20).

    • Derived ratio $5/1 = $10/2 = $20/4 = 5.

  • Model (slope) = 5. - Predict 3 people: $5 * 3 = 15 ($15).

  • Name: simple linear regression / prediction model.

Best-Fit Line & Error Minimisation
  • Real-world points rarely lie perfectly on a line.

  • Procedure to pick best line: 1. Draw candidate line.

    1. For each data point, draw vertical deviation (error).

    2. Sum absolute (or squared) errors.

    3. Compare multiple candidate lines; choose line with minimum total error.

  • Fundamental ML viewpoint: learning = optimising parameters to minimise error.

Three Visual Analogies for Error Minimisation
  1. Robot-to-Cake path: program robot with right/left/forward moves so distance → 0.

  2. Hiker descending mountain: choose path where gradient/height continuously decreases.

  3. Separating red & blue dots (classification line): rotate/shift line until misclassification count → 0.

Applications Highlighted
  • Facebook friend suggestions, Amazon/Flipkart product recommendations, Netflix/YouTube content suggestions, etc.—all rely on ML models reducing error in predictions.

Concept of Features
  • Features = measurable properties extracted from raw data.

  • Image example (face recognition): nose length, face width, eye size, hair colour, presence of spectacles, texture, ear length, lip width, etc.

  • Signal (1-D) example: mean, standard deviation, entropy, max, min, kurtosis.

  • Guideline: the richer & more discriminative the features, the more powerful the model.

From Continuous Data to Features
  • Raw signal: 23.6 s sampled at 100 Hz → 23.6 * 100 = 2360 discrete samples.

  • X-axis switches from time domain to sample index domain.

  • Instead of feeding 2360 values, compute representative features (e.g., mean, std, entropy, max, min) → dramatic dimensionality reduction.

Training–Testing Split
  • Typical example: 70 % of feature rows for training, 30 % for testing. - Split ratio can vary (e.g.

    80-20, 60-40) depending on dataset size & task.

  • Analogy: - Training = parent shows child multiple versions of A & B.

    • Testing = exam with unseen shapes; accuracy = (correct answers / total questions) * 100.

    • Example accuracy: (3/4) * 100 = 75%.

Building a Training Spreadsheet
  • Example flower dataset with three classes (rose, sunflower, jasmine). - Columns: sepal-length, sepal-width, petal-length, petal-width, class-label.

    • Only features + labels are fed to ML algorithm, not raw images.

End-to-End ML Workflow (diagram verbally described)
  1. Input features.

  2. Train algorithm (on 70 %).

  3. Feed unknown test features (30 %).

  4. Model outputs prediction.

  5. Evaluate performance (accuracy, error metrics).

Three Categories of Machine Learning
1. Supervised Learning
  • Data come with class labels (target outputs).

  • Examples: coin weight → country, flower features → species.

  • Algorithms discussed in course: - Linear Regression

    • Logistic Regression

    • Support Vector Machine (SVM)

    • Naïve Bayes

    • Decision Tree / Random Forest

  • Applications: weather prediction, biometrics (fingerprint/iris), credit scoring, hospital readmission prediction, customer purchase likelihood.

2. Unsupervised Learning
  • No labels; algorithm must discover structure (clusters, components).

  • Concept demo: English vs. Chinese exam scores automatically form two clusters (English-strong, Chinese-strong).

  • Algorithms highlighted: - K-Means clustering

    • Principal Component Analysis (PCA)

    • Hidden Markov Model (HMM—for speech).

  • Applications: item identification & grouping, customer segmentation, medical imaging (e.g., MRI tumour detection, chest-X-ray nodule finding), recommendation systems.

3. Reinforcement Learning
  • Model interacts with environment, receives feedback/reward, learns optimal actions.

  • Example: vision system mislabels apple as mango → feedback “wrong”, updates policy.

  • Used in robotics, game playing (e.g., chess/Go), self-driving cars.

Summary of Algorithms Mentioned
  • Classification (Supervised): Naïve Bayes, SVM, Decision Tree, Random Forest.

  • Regression (Supervised): Linear Regression, Logistic Regression.

  • Clustering (Unsupervised): K-Means, HMM.

  • Dimensionality Reduction (Unsupervised): PCA, ICA.

  • Reinforcement: Q-Learning, Policy Gradients (not deeply covered).

Common Limitations & Challenges
  • Requires large, high-quality datasets; small or noisy data hurt performance.

  • Feature extraction is often the bottleneck; poor features → weak models. - Must sometimes add texture, shape or higher-order statistics to resolve overlapping classes.

  • Computation & memory cost increase with data volume and feature dimensionality.

  • Manual feature design is domain-expert intensive; motivates shift toward Deep Learning (automatic feature learning).

Q&A Highlights & Clarifications
  • Image vs. Feature table: feeding raw images is possible (esp. with DL), but for classic ML we extract numeric descriptors to cut computation. More features → better separation, but watch for redundancy & overfitting.

  • Continuous model improvement: after deployment, newly labelled cases (e.g., 10 001st MRI scan) can be appended to training set, creating a continually learning system.

  • Sampling Frequency: Hz = samples per second; 100 Hz ⇒ 100 discrete points each second.

  • Accuracy formula reiterated: Accuracy = (Correct / Total) * 100.

  • Supervised necessity: Training/testing split is essential when labels exist; otherwise unsupervised methods apply.

Planned Practical Sessions (Afternoon)
  • Hands-on coding for: - Decision Tree & Random Forest

    • Linear Regression demo

    • Possible SVM example

  • Further deep dive into feature extraction, PCA, K-Means in subsequent lectures.

Key Take-Away Points
  • ML = learning parameters that minimise error on data.

  • Quality & quantity of features dictate success; extraction is non-trivial.

  • Understand differences: Supervised (labels), Unsupervised (clusters), Reinforcement (interaction).

  • Always keep distinct training vs. testing sets for unbiased evaluation.

  • Deep Learning eclipses ML when feature extraction & scale become unmanageable.