ICT202 Machine Learning - Topic 1: Introduction to Machine Learning
Introduction to Machine Learning
What is Machine Learning?
Machine Learning (ML) is a part of AI where computers learn from data.
ML helps computers make predictions without being directly programmed.
It's a way to automatically learn rules from data to improve predictions and decisions.
ML allows computers to learn without specific instructions.
From Learning to Machine Learning
Learning: Getting skills from experience.
Machine Learning: Getting skills from data.
Skill: Improving how well something performs (like accuracy).
Example: In healthcare, ML uses patient data to predict health outcomes.
Why Machine Learning?
Some problems are hard to solve with simple rules.
Example: Recognizing trees is difficult to program by hand.
ML can automatically learn rules from images to recognize trees.
ML helps build complex systems.
Use Cases:
Navigating on Mars because humans can't easily define the solution.
Recognizing speech/visuals, needing fast decisions.
High-speed trading.
Key Essence of Machine Learning
There’s a pattern to learn, improving performance.
No direct programming is possible, so use ML.
Data is available for ML.
ML Applications
Food Data: Predict food poisoning risk using Twitter data.
Clothing Data: Suggest clothes using sales and surveys.
Housing Data: Predict energy use of buildings.
Transportation Data: Recognize traffic signs.
Recommender System Data: Predict movie ratings.
Example: Credit Approval
Data: Age, Gender, Salary, Years at Address, Debt.
Output: Credit Approval (yes/no).
Formalizing the Learning Problem
Basic Notations:
Input: x (customer applications).
Output: y (good/bad credit risk).
Target Function: f: X \rightarrow Y (ideal approval formula).
Data: D = {(x1, y1), (x2, y2), …, (xn, yn)} (bank records).
Hypothesis: g (learned formula).
Formalizing the Learning Problem
Target function: f: X \rightarrow Y.
Training examples: D = {(x1, y1), (x2, y2), …, (xn, yn)}.
Final hypothesis: g \approx f.
ML uses data to find g that is close to f.
Types of Machine Learning
Supervised Learning
Unsupervised Learning
Semi-Supervised Learning
Reinforcement Learning
Supervised vs. Unsupervised Learning
Supervised Learning:
Has labels.
Gets direct feedback.
Predicts outcomes.
Unsupervised Learning:
No labels.
No feedback.
Finds hidden structures.
Supervised Learning
Labeled data --> ML Algorithm --> Model
Learn a model to predict future data using labeled data.
Supervised means we know the correct answers.
Classification
Predict categories using past data.
Categories are distinct groups.
Example: Spam filtering.
Binary Classification
y = {+1, -1}
Two categories.
Learn a boundary to separate classes.
Multiclass Classification
y = {1, 2, …, K}
Assign a category from training data to new data.
Example: Character recognition, email sorting.
Regression
y = [a, b] \subset R
Predict continuous results.
Find relationships between variables.
Example: Predict student scores based on study time.
Regression Details
Fit a line to minimize distance between points and the line.
Use the line to predict new outcomes.
Unsupervised Learning
Supervised: knows the answer.
Reinforcement: defines a reward.
Unsupervised: uses unlabeled data.
Explore data structure without a known outcome.
Unsupervised Learning Problems
Clustering: {xi} \Rightarrow {Ci}, like "unsupervised classification". Example: articles to topics.
Density Estimation: {x_i} \Rightarrow p(x), like "unsupervised regression". Example: traffic reports -> dangerous areas.
Outlier Detection: {x_i} \Rightarrow {0, 1}, like "unsupervised binary classification". Example: internet logs -> intrusion alert.
Clustering
Organize data into subgroups without knowing groups beforehand.
Clusters group similar objects.
Good for structuring info.
Example: Finding customer groups.
Other examples: search engines, image analysis.
Unsupervised Dimensionality Reduction
High-dimensional data is hard to store and process.
Reduce noise and compress data while keeping important info.
Unsupervised Dimensionality Reduction
Data visualization: Show high-dimensional data in 2D or 3D plots.
Semi-Supervised Learning
Uses some labeled and more unlabeled data.
Examples:
Face images with some labels -> face identifier.
Health data with some labels -> predict medicine effects.
Avoids expensive labeling.
Basic Terminology and Notations
Iris Dataset example
Samples (instances)
Features (attributes)
Class labels (targets)
Example: Sepal length, Sepal width, Petal length, Petal width, Class labels (Setosa, Versicolor, Virginica)
X = \begin{bmatrix} x{11} & x{12} & x{13} & x{14} \ x{21} & x{22} & x{23} & x{24} \ … & … & … & … \ x{150,1} & x{150,2} & x{150,3} & x{150,4} \ \end{bmatrix}
y = {Setosa, Versicolor, Virginica}
Roadmap for Building Machine Learning Systems
Raw Data --> Preprocessing --> Training & Test Datasets --> Training, Evaluation, Prediction
Preprocessing: cleaning data.
Learning Algorithm: picking the right method.
Model Selection: choosing the best model.
Preprocessing
Data preprocessing is very important.
Raw data is often not ready for ML.
ML algorithms need features on the same scale.
Preprocessing cont.
Some features might be similar.
Reduce dimensions to compress features.
Split data into training and test sets.
Training and Selecting a Predictive Model
No Free Lunch Theorems: no one way works for everything.
Each algorithm has biases; compare different ones to pick the best.
Training and Selecting a Predictive Model cont.
Measure performance with specific metrics.
Use cross-validation: split data to check how well the model works.
Evaluation and Prediction
Test the model on unseen data.
If it works well, use it to predict new data.
Apply preprocessing steps from training to new data.
Python for Machine Learning
Python is good for data science with many libraries.
Python can be slow, but libraries like NumPy and SciPy make it faster.
Python Libraries
NumPy
SciPy
Pandas
Scikit-learn
Tensorflow
Keras
Visualization: matplotlib, Seaborn
Resources: Python in ML, TensorFlow course
Python for Machine Learning – Coding Platform
Jupyter Notebook: A tool to write and share code, equations, and visuals.
Uses your computer's resources.
Install libraries with pip or Anaconda.
Python for Machine Learning – Coding Platform (Colab)
Colab: Google's cloud-based notebook.
Write and run Python code in a browser.
Edit together with others.
Save notebooks to Google Drive/GitHub.
Free to use with free GPUs.
Has common libraries pre-installed.