MS

ICT202 Machine Learning - Topic 1: Introduction to Machine Learning

Introduction to Machine Learning

What is Machine Learning?
  • Machine Learning (ML) is a part of AI where computers learn from data.

  • ML helps computers make predictions without being directly programmed.

  • It's a way to automatically learn rules from data to improve predictions and decisions.

  • ML allows computers to learn without specific instructions.

From Learning to Machine Learning
  • Learning: Getting skills from experience.

  • Machine Learning: Getting skills from data.

  • Skill: Improving how well something performs (like accuracy).

  • Example: In healthcare, ML uses patient data to predict health outcomes.

Why Machine Learning?
  • Some problems are hard to solve with simple rules.

  • Example: Recognizing trees is difficult to program by hand.

  • ML can automatically learn rules from images to recognize trees.

  • ML helps build complex systems.

  • Use Cases:

    • Navigating on Mars because humans can't easily define the solution.

    • Recognizing speech/visuals, needing fast decisions.

    • High-speed trading.

Key Essence of Machine Learning
  • There’s a pattern to learn, improving performance.

  • No direct programming is possible, so use ML.

  • Data is available for ML.

ML Applications
  • Food Data: Predict food poisoning risk using Twitter data.

  • Clothing Data: Suggest clothes using sales and surveys.

  • Housing Data: Predict energy use of buildings.

  • Transportation Data: Recognize traffic signs.

  • Recommender System Data: Predict movie ratings.

Example: Credit Approval
  • Data: Age, Gender, Salary, Years at Address, Debt.

  • Output: Credit Approval (yes/no).

Formalizing the Learning Problem
  • Basic Notations:

    • Input: x (customer applications).

    • Output: y (good/bad credit risk).

    • Target Function: f: X \rightarrow Y (ideal approval formula).

    • Data: D = {(x1, y1), (x2, y2), …, (xn, yn)} (bank records).

    • Hypothesis: g (learned formula).

Formalizing the Learning Problem
  • Target function: f: X \rightarrow Y.

  • Training examples: D = {(x1, y1), (x2, y2), …, (xn, yn)}.

  • Final hypothesis: g \approx f.

  • ML uses data to find g that is close to f.

Types of Machine Learning
  • Supervised Learning

  • Unsupervised Learning

  • Semi-Supervised Learning

  • Reinforcement Learning

Supervised vs. Unsupervised Learning
  • Supervised Learning:

    • Has labels.

    • Gets direct feedback.

    • Predicts outcomes.

  • Unsupervised Learning:

    • No labels.

    • No feedback.

    • Finds hidden structures.

Supervised Learning

Labeled data --> ML Algorithm --> Model

  • Learn a model to predict future data using labeled data.

  • Supervised means we know the correct answers.

Classification

  • Predict categories using past data.

  • Categories are distinct groups.

  • Example: Spam filtering.

Binary Classification

  • y = {+1, -1}

  • Two categories.

  • Learn a boundary to separate classes.

Multiclass Classification

  • y = {1, 2, …, K}

  • Assign a category from training data to new data.

  • Example: Character recognition, email sorting.

Regression

  • y = [a, b] \subset R

  • Predict continuous results.

  • Find relationships between variables.

  • Example: Predict student scores based on study time.

Regression Details

  • Fit a line to minimize distance between points and the line.

  • Use the line to predict new outcomes.

Unsupervised Learning
  • Supervised: knows the answer.

  • Reinforcement: defines a reward.

  • Unsupervised: uses unlabeled data.

  • Explore data structure without a known outcome.

Unsupervised Learning Problems

  • Clustering: {xi} \Rightarrow {Ci}, like "unsupervised classification". Example: articles to topics.

  • Density Estimation: {x_i} \Rightarrow p(x), like "unsupervised regression". Example: traffic reports -> dangerous areas.

  • Outlier Detection: {x_i} \Rightarrow {0, 1}, like "unsupervised binary classification". Example: internet logs -> intrusion alert.

Clustering

  • Organize data into subgroups without knowing groups beforehand.

  • Clusters group similar objects.

  • Good for structuring info.

  • Example: Finding customer groups.

  • Other examples: search engines, image analysis.

Unsupervised Dimensionality Reduction

  • High-dimensional data is hard to store and process.

  • Reduce noise and compress data while keeping important info.

Unsupervised Dimensionality Reduction

  • Data visualization: Show high-dimensional data in 2D or 3D plots.

Semi-Supervised Learning
  • Uses some labeled and more unlabeled data.

  • Examples:

    • Face images with some labels -> face identifier.

    • Health data with some labels -> predict medicine effects.

  • Avoids expensive labeling.

Basic Terminology and Notations

Iris Dataset example

  • Samples (instances)

  • Features (attributes)

  • Class labels (targets)

  • Example: Sepal length, Sepal width, Petal length, Petal width, Class labels (Setosa, Versicolor, Virginica)

  • X = \begin{bmatrix} x{11} & x{12} & x{13} & x{14} \ x{21} & x{22} & x{23} & x{24} \ … & … & … & … \ x{150,1} & x{150,2} & x{150,3} & x{150,4} \ \end{bmatrix}

  • y = {Setosa, Versicolor, Virginica}

Roadmap for Building Machine Learning Systems

Raw Data --> Preprocessing --> Training & Test Datasets --> Training, Evaluation, Prediction

  • Preprocessing: cleaning data.

  • Learning Algorithm: picking the right method.

  • Model Selection: choosing the best model.

Preprocessing
  • Data preprocessing is very important.

  • Raw data is often not ready for ML.

  • ML algorithms need features on the same scale.

Preprocessing cont.
  • Some features might be similar.

  • Reduce dimensions to compress features.

  • Split data into training and test sets.

Training and Selecting a Predictive Model
  • No Free Lunch Theorems: no one way works for everything.

  • Each algorithm has biases; compare different ones to pick the best.

Training and Selecting a Predictive Model cont.
  • Measure performance with specific metrics.

  • Use cross-validation: split data to check how well the model works.

Evaluation and Prediction
  • Test the model on unseen data.

  • If it works well, use it to predict new data.

  • Apply preprocessing steps from training to new data.

Python for Machine Learning
  • Python is good for data science with many libraries.

  • Python can be slow, but libraries like NumPy and SciPy make it faster.

Python Libraries
  • NumPy

  • SciPy

  • Pandas

  • Scikit-learn

  • Tensorflow

  • Keras

  • Visualization: matplotlib, Seaborn

  • Resources: Python in ML, TensorFlow course

Python for Machine Learning – Coding Platform
  • Jupyter Notebook: A tool to write and share code, equations, and visuals.

  • Uses your computer's resources.

  • Install libraries with pip or Anaconda.

Python for Machine Learning – Coding Platform (Colab)
  • Colab: Google's cloud-based notebook.

  • Write and run Python code in a browser.

  • Edit together with others.

  • Save notebooks to Google Drive/GitHub.

  • Free to use with free GPUs.

  • Has common libraries pre-installed.