Machine Learning Notes

MACHINE LEARNING

  • Definition: Machine Learning (ML) allows computers to learn from data without being explicitly programmed.
    • According to Alan Turing: "Can machines think?"
    • Example: Turing Test - A human judge interacts with a human and a computer to distinguish between them. If the judge cannot consistently tell which is which, the computer passes the test.

KEY DEFINITIONS

  • Arthur Samuel (1959): Describes ML as the ability of computers to learn from experience.
  • T. Mitchell: A computer program learns from experience E in regard to tasks T if performance P improves with E.
    • Applications of ML:
    • Spam detection
    • Voice recognition
    • Stock trading
    • Robotics
    • Healthcare diagnostics
    • SEO and ecommerce optimizations.

TYPES OF MACHINE LEARNING

SUPERVISED LEARNING

  • Involves an input-output pairing where the correct output is known.
  • Types of problems:
    • Regression: Predicting a continuous output.
    • Example: Predict house price based on size.
    • Classification: Predicting discrete categories.
    • Example: Classifying a tumor as benign or malignant.

UNSUPERVISED LEARNING

  • The model finds hidden patterns in data without labeled responses.
  • No right or wrong answers, used to explore data and discover patterns.
  • Clustering: Grouping data based on similarity.
    • Example: Classifying genes based on various characteristics.

FUNCTIONALITY IN SUPERVISED LEARNING

MODELING

  • Models define the relationship between input (X) and output (Y).
  • Training set is represented as pairs of input and output: ({(x(i), y(i)); i = 1, …, m}).

HYPOTHESIS

  • A function ( h: X \rightarrow Y ) that predicts the output.
  • Example of hypothesis in linear regression:
    • ( Y' = mx + c ) where ( m ) is the slope and ( c ) is the intercept.

LOSS FUNCTION

  • Measures accuracy of predictions by comparing predicted values to actual outcomes.
  • Often minimized using techniques like gradient descent.
  • For example, the Mean Squared Error is calculated as:
    \frac{1}{n} \sum_{i=1}^{n} (h(x(i)) - y(i))^2

GRADIENT DESCENT

  • An optimization algorithm used to find the minimum of a function.
  • Iteratively adjusts parameters to reduce the loss function value.
  • Involves calculating the derivative to find the slope of the cost function and adjusting with learning rate ( \alpha ).
  • Formula:
    \theta{new} = \theta{old} - \alpha \frac{dJ(\theta)}{d\theta}\n

MULTIVARIATE REGRESSION

  • Extends multiple features for prediction:
    • For multiple features: ( h(x) = \theta0 + \theta1x1 + \theta2x2 + … + \thetanx_n )
  • Gradient descent then adapts accordingly to multiple variables.

POLYNOMIAL REGRESSION

  • Used when data does not fit a linear model.
  • Example: Quadratic function ( ax^2 + bx + c ).

LOSS FUNCTION IN POLYNOMIAL REGRESSION

  • Similar to linear regression; evaluates differences but considers polynomial degrees.

LOGISTIC REGRESSION

  • Used for binary classification tasks (output is discrete).
  • Uses the sigmoid function to convert any value into a probability between 0 and 1:
    h(x) = g(\theta^T x) = \frac{1}{1 + e^{-z}} \text{, where } z = \theta^T x

COST FUNCTION FOR LOGISTIC REGRESSION

  • The cost function differs from linear regression due to non-linear output, defined as:
    J(\theta) = -\frac{1}{m} [y \log(h(x)) + (1 - y) \log(1 - h(x))]

OVERFITTING & UNDERFITTING

  • Overfitting: Model learns noise from the training data, performing poorly on unseen data.
  • Underfitting: Model is too simplistic to capture underlying structure.
    • Addressed through regularization, simplifying models by penalizing large coefficients.

CATEGORICAL DATA HANDLING

  • Machine learning algorithms require numerical input, necessitating encoding:
    • One-Hot Encoding: Converts categorical variables into binary variables.
  • Example: For "color" variable with categories like "red," "green," and "blue," represent them as three binary variables.

MULTICLASS CLASSIFICATION

  • Extends binary classification methods using one-vs-all strategy.

REGULARIZATION TECHNIQUES

  • Used to reduce overfitting while preserving all features:
    • L1 Regularization (Lasso): Adds an absolute value penalty on the size of coefficients.
    • L2 Regularization (Ridge): Adds the squared value of coefficients to the loss function; simpler hypothesis results avoiding overfitting.