Applied Machine Learning – Comprehensive Bullet-Point Notes

Course Information & Context

  • Institution: BITS-Pilani – Work Integrated Learning Programmes (WILP)
  • Core paper: ZG568 – “Introduction / Applied Machine Learning” (multi-campus offering, flipped-classroom pedagogy)
  • Authors & editors: Dr Sugata Ghosal, Dr Rama Satish K V, Prof Brahma Naidu, Team-AI, etc.
  • Delivery model
    • Recorded videos + live contact sessions (CS-xx)
    • Jupyter/Colab demos; local Anaconda install recommended
    • Attendance counted only if logged-in until session end & interactive
  • Principal textbooks / references
    • T1 Hands-On ML (A. Géron 2e/3e)
    • T2 Tan, Steinbach, Kumar – Data Mining
    • R1 Interpretable ML (C. Molnar)
    • R2 P. Domingos – “Few Useful Things …”

Modular Structure (10 Modules)

  • M1 Introduction → definitions, types, challenges
  • M2 & M3 “Big Picture” → End-to-End pipeline (problem framing → EDA → model → deploy)
  • M4 Linear Prediction (LR, GD, regularisation, bias/variance)
  • M5–M6 Classification I/II (LogReg, SVM, NB, DT, Ensemble)
  • M7 Unsupervised (PCA, k-means, EM, apps)
  • M8–M9 NN & Deep Nets (MLP, CNN, RNN, apps)
  • M10 FAccT ML (Fairness, Accountability, Transparency, Robustness)

Machine Learning: What & Why

  • ML = algorithms \mathcal{A} that improve performance P on task T with experience E.
  • Traditional vs ML pipeline diagrams (program ↔ data interchange)
  • Typical tasks & examples
    • Classification, Regression, Sequence-decision (RL)
    • Spam filter, OCR, medical imaging, autonomous driving, VQA, credit-fraud, recommender, GAN image generation, speech-ASR

Framing an ML Problem (Housing-price demo)

  • Steps: business objective → choose supervised multivariate regression → pick metric (RMSE, MAE) → verify assumptions (batch vs online, instance vs model-based)
  • Key elements of a full project:
    1 Framing 2 Data types 3 Pre-processing 4 EDA/visualisation 5 Feature engineering 6 Model build/test

Data Types & Representation

  • Attribute categories: Nominal, Ordinal, Interval, Ratio; Continuous vs Discrete
  • Data containers: record table, data-matrix, document-term, transactions, graphs, sequences, spatio-temporal
  • Important characteristics: dimensionality, sparsity, resolution, size

Data Pre-processing

  • Quality issues: insufficient, non-representative, noise/outliers, missing, irrelevant features
  • Remedies: cleaning, imputation, feature engineering (selection & extraction), regularisation
  • Transformation ops: aggregation, sampling, binning, scaling, encoding, DR (PCA/SVD, t-SNE), curse-of-dimensionality note

Summary Statistics & Proximity

  • Location: mean \mu, median, percentiles (p-tile)
  • Spread: range, variance \sigma^2, std, MAD
  • Distances
    • Euclidean d2(x,y)=\sqrt{\sum (xk-yk)^2} • Minkowski dr, Manhattan r=1, Chebyshev r\to\infty
    • Mahalanobis d_M=\sqrt{(x-\mu)^T\Sigma^{-1}(x-\mu)}
    • Cosine sim \cos(\theta)=\frac{x\cdot y}{|x|\,|y|}, Correlation \rho

Visualisation Cheatsheet

  • 1-D: histogram, boxplot
  • 2-D/3-D: scatter, heatmap, contour, pair-plot
  • Matrix/corr plots for high-dimension

Supervised Learning

Regression (Linear)

  • Model \hat{y}=w^Tx+b; cost J=\frac1{2m}\sum (\hat{y}-y)^2
  • Closed-form w=(X^TX)^{-1}X^Ty &
    GD update w:=w-\eta \nabla J; variants (batch, mini-batch, SGD)
  • Regularisation:
    Ridge +\lambda |w|2^2, Lasso +\lambda |w|1, Early-stopping
  • Bias–Variance decomposition E[(y-\hat{f})^2]=\text{Bias}^2+\text{Var}+\sigma^2

Classification

  • Logistic regression \sigma(z)=1/(1+e^{-z}); predict class by P(y=1|x)
  • Naïve Bayes:
    P(y|x)\propto P(y)\prodi P(xi|y); independence assumption; Laplace smoothing
  • Linear SVM, kernel SVM (soft margin, C, kernels)
  • Decision Tree (ID3 info-gain Gain(S,A)=H(S)-\sumv\frac{|Sv|}{|S|}H(Sv)); overfit control via pruning; CART Gini 1-\sum pi^2
  • Ensembles:
    • Bagging / Random Forest (bootstrap, decorrelation)
    • Boosting / AdaBoost (weight update wi\leftarrow wi e^{\alphat I(yi\ne ht(xi))})
    • Error reduction when diverse & better-than-chance base-learners

Evaluation & Model Selection

  • Confusion matrix terms: TP, FP, FN, TN
  • Metrics: Accuracy, Precision \frac{TP}{TP+FP}, Recall \frac{TP}{TP+FN}, F1, ROC & AUC, cost-matrix, class-imbalance
  • Data split: hold-out, k-fold CV (k≈10), LOOCV, bootstrap (.632)
  • Hyper-parameter tuning via grid/random/Bayesian; nested CV

Unsupervised Learning

Dimensionality Reduction

  • PCA: maximise variance; eigen-decomp of \Sigma; projection Z=U^T X; scree plot variance explained
  • Application: eigenfaces, compression

Clustering

  • K-means: minimise SSE; steps init→assign→update; issues (initialisation, scale, k, shapes); Elbow & silhouette
  • K-medoids, Hierarchical (agglomerative, linkage), density-based (DBSCAN)
  • GMM & EM; soft-clustering; likelihood \sumk \pik \mathcal{N}(x|\muk,\Sigmak)
  • Cluster validity indices: SSE, Silhouette, Entropy/PI, Dunn, external ARI

Neural Networks & Deep Learning

  • Perceptron learning rule w:=w+\eta (t-o)x; limitations (linear separability)
  • MLP: layers, activations (ReLU, LeakyReLU, \tanh, sigmoid); back-prop chain rule; vanishing/exploding gradients
  • Initialisation: Xavier \sigma=\sqrt{\frac{1}{n{in}}}, He \sqrt{\frac{2}{n{in}}}
  • Optimisers: Momentum, Nesterov, RMSProp, Adam; LR scheduling, batch-norm
  • Regularisation: Dropout (p≈0.5), L1/L2, early-stop

Convolutional NN (CNN)

  • Convolution layer (kernel f\times f, stride s, padding p) output size (n+2p-f)/s+1
  • Feature maps, channels, parameter count f^2\,c{in}\,c{out}
  • Pooling (max/avg, stride 2), flatten, FC
  • Typical stacks: [conv-BN-ReLU]* → pool → dense

Recurrent NN (RNN)

  • Sequence modelling ht=\sigma(Wh h{t-1}+Wx x_t+b); many-to-one, many-to-many tasks
  • LSTM gates (forget ft, input it, candidate gt, output ot) with cell state c_t
  • GRU simplified (reset & update gates)

End-to-End ML Pipeline Recap

  1. Problem & metric
  2. Data acquire & store
  3. EDA / visualise
  4. Pre-process & feature eng.
  5. Split train/val/test
  6. Select & train models
  7. Tune hyper-params
  8. Evaluate, cross-validate
  9. Deploy – monitor, A/B, retrain

Fairness, Accountability, Transparency & Ethics (FAccT)

  • Fairness definitions
    • Demographic parity P(\hat y=1|A=0)=P(\hat y=1|A=1)
    • Equalised odds P(\hat y=1|A,Y)=\text{indep of }A
    • Equal opportunity (TPR parity)
  • Bias sources: prejudice in data, under-estimation, sampling bias
  • Mitigation: balanced datasets, re-weigh, regularisers (Prejudice Index PI=\sum P(y,s)\log\frac{P(y,s)}{P(y)P(s)}), fair-representation, post-processing thresholds
  • Interpretability
    • Intrinsic (linear, DT, rule-lists) vs Post-hoc (LIME, SHAP)
    • Global vs Local; model-specific vs model-agnostic
  • Privacy (data masking, DP), Security (poisoning, adversarial), Accountability (model cards, audits)

Key Mathematical Expressions

  • Gradient descent w^{(k+1)}=w^{(k)}-\eta\,\nabla J(w^{(k)})
  • AdaBoost weight \alphat=\frac12\ln\frac{1-\epsilont}{\epsilon_t}
  • Logistic cost J= -\frac1m\sum \big[y\log\hat y+(1-y)\log(1-\hat y)\big]
  • PCA eigen-problem \Sigma ui = \lambdai u_i
  • LSTM equations
    ft=\sigma(Wf[xt,h{t-1}]+bf) it=\sigma(Wi[xt,h{t-1}]+bi)
    \tilde ct=\tanh(Wc[xt,h{t-1}]+bc) ct=ft\odot c{t-1}+it\odot\tilde ct
    ot=\sigma(Wo[xt,h{t-1}]+bo) ht=ot\odot\tanh(ct)

Practical Tips & Colab Resources

  • Hands-On-ML notebooks:
    • 02endtoendmachinelearningproject.ipynb (housing)
    • 11trainingdeepneuralnetworks.ipynb (BN, LR sched, etc.)
  • Use GPU runtime via Colab; remember set random seeds for reproducibility; monitor tensor-board.

Closing Reminders

  • Select metrics aligning with business cost, esp. in imbalanced or regulated settings.
  • Always validate assumptions & monitor post-deployment drift.
  • Strive for interpretable, fair & robust models alongside accuracy.