Definition: Machine Learning (ML) allows computers to learn from data without being explicitly programmed.
According to Alan Turing: "Can machines think?"
Example: Turing Test - A human judge interacts with a human and a computer to distinguish between them. If the judge cannot consistently tell which is which, the computer passes the test.
KEY DEFINITIONS
Arthur Samuel (1959): Describes ML as the ability of computers to learn from experience.
T. Mitchell: A computer program learns from experience E in regard to tasks T if performance P improves with E.
Applications of ML:
Spam detection
Voice recognition
Stock trading
Robotics
Healthcare diagnostics
SEO and ecommerce optimizations.
TYPES OF MACHINE LEARNING
SUPERVISED LEARNING
Involves an input-output pairing where the correct output is known.
Types of problems:
Regression: Predicting a continuous output.
Example: Predict house price based on size.
Classification: Predicting discrete categories.
Example: Classifying a tumor as benign or malignant.
UNSUPERVISED LEARNING
The model finds hidden patterns in data without labeled responses.
No right or wrong answers, used to explore data and discover patterns.
Clustering: Grouping data based on similarity.
Example: Classifying genes based on various characteristics.
FUNCTIONALITY IN SUPERVISED LEARNING
MODELING
Models define the relationship between input (X) and output (Y).
Training set is represented as pairs of input and output: ({(x(i), y(i)); i = 1, …, m}).
HYPOTHESIS
A function ( h: X \rightarrow Y ) that predicts the output.
Example of hypothesis in linear regression:
( Y' = mx + c ) where ( m ) is the slope and ( c ) is the intercept.
LOSS FUNCTION
Measures accuracy of predictions by comparing predicted values to actual outcomes.
Often minimized using techniques like gradient descent.
For example, the Mean Squared Error is calculated as:
\frac{1}{n} \sum_{i=1}^{n} (h(x(i)) - y(i))^2
GRADIENT DESCENT
An optimization algorithm used to find the minimum of a function.
Iteratively adjusts parameters to reduce the loss function value.
Involves calculating the derivative to find the slope of the cost function and adjusting with learning rate ( \alpha ).
Gradient descent then adapts accordingly to multiple variables.
POLYNOMIAL REGRESSION
Used when data does not fit a linear model.
Example: Quadratic function ( ax^2 + bx + c ).
LOSS FUNCTION IN POLYNOMIAL REGRESSION
Similar to linear regression; evaluates differences but considers polynomial degrees.
LOGISTIC REGRESSION
Used for binary classification tasks (output is discrete).
Uses the sigmoid function to convert any value into a probability between 0 and 1:
h(x) = g(\theta^T x) = \frac{1}{1 + e^{-z}} \text{, where } z = \theta^T x
COST FUNCTION FOR LOGISTIC REGRESSION
The cost function differs from linear regression due to non-linear output, defined as:
J(\theta) = -\frac{1}{m} [y \log(h(x)) + (1 - y) \log(1 - h(x))]
OVERFITTING & UNDERFITTING
Overfitting: Model learns noise from the training data, performing poorly on unseen data.
Underfitting: Model is too simplistic to capture underlying structure.
Addressed through regularization, simplifying models by penalizing large coefficients.