ML Final

Cluster Validation & Evaluation

  • Cluster Validation: This refers to the methods used to assess how well the data is grouped into clusters. It typically involves:

    • Internal validation: Measures how well the data is grouped within the cluster itself (e.g., intra-cluster distance).

    • External validation: Compares the clustering results with external ground truth or labels.

    • Methods: Examples include silhouette scores, Davies-Bouldin index, and others. This may be part of your homework or exam.

  • Evaluation Method: Likely refers to evaluating machine learning models in general, which can be done through metrics like accuracy, precision, recall, F1-score, etc., or through cluster validation metrics if dealing with unsupervised learning.

  • Finding the Number of Clusters:

    • Trial and error: Using methods like the Elbow method or Silhouette score to find an optimal number of clusters by testing different values and observing the results.

    • Subjective: The number of clusters can often be selected based on domain knowledge, although this is not always reliable.

  • Visualization: It’s difficult to visualize clusters beyond 3D. So, dimensionality reduction techniques like PCA or t-SNE might be used to visualize higher-dimensional data.

  • Limitation of K-Means: K-means assumes that clusters are spherical in shape. It may struggle with complex cluster shapes (like elongated or irregular clusters).


Cross-Correlation in Convolution

  • Cross-Correlation: This is the method used to compare two signals (in this case, images) by sliding one signal (the filter) over another (the image). The dot product (sum of element-wise multiplications) at each location helps measure the similarity between the filter and the local region of the image.

    • In Convolutional Neural Networks (CNNs), this process is often called convolution, but it is technically cross-correlation.

  • CNN Layers:

    • Convolution Layers: These layers use filters (also called kernels) that act as pattern detectors. The goal is to identify small patterns like edges or textures in the image. The CNN learns these filters through backpropagation, which updates their weights to optimize the model’s performance.

    • Filter Size: The size of the filter is a hyperparameter chosen before training.

    • Sliding Dot Product: The process of applying the filter (pattern detector) over the image to extract features.

    • Pattern Detectors: Filters are designed to detect specific patterns (e.g., edges, corners) in images. They are learned during training.

    • Pooling Layers: Used to reduce the spatial dimensions of the data, making the model more computationally efficient. The most common types are:

      • Max Pooling: Selects the maximum value from a local region.

      • Average Pooling: Selects the average value from a local region.

      • Min Pooling: Selects the minimum value from a local region. Pooling helps to achieve local invariance (object recognition despite translation) and translation invariance (recognizing objects regardless of position).


CNN Architecture

  • Goal: The goal of CNNs is feature extraction. CNNs use multiple layers (convolutional and pooling) to progressively extract features from raw image data.

  • Fully Connected Network: After several convolution and pooling layers, the data is flattened and passed through fully connected layers for classification.

  • Filters and Weights: The filters in convolution layers act like pattern detectors. The weights in fully connected layers are used to make final predictions.

  • Difference Between CNNs and Fully Connected Networks (MLP): In CNNs, the layers are not fully connected to the previous layer but are instead selectively connected based on the filter size. This reduces the number of parameters and allows the model to learn spatial hierarchies in the data.


Activation Functions

  • Sigmoid: The sigmoid function outputs values between 0 and 1. It is often used in binary classification tasks but is rarely used in deep networks due to its vanishing gradient problem.

  • Hyperbolic Tangent (tanh): Similar to the sigmoid but with outputs between -1 and 1. It’s a shifted version of the sigmoid, but still suffers from the vanishing gradient problem.

  • ReLU (Rectified Linear Unit): The most widely used activation function in CNNs. It outputs 0 for negative inputs and passes positive values as-is. It helps mitigate the vanishing gradient problem and speeds up training.

    • Leaky ReLU: A variant where small negative values are allowed (using a small slope for negative inputs).

    • Maxout: A piecewise approximation of the ReLU activation, which learns the activation function itself.


Backpropagation

  • Backpropagation is a method used to train neural networks. It consists of two steps:

    1. Forward Pass: Calculate the output of the network using current weights.

    2. Backward Pass: Calculate the gradients of the loss with respect to each weight, and update the weights to minimize the loss using gradient descent or a variant.

  • Weight Update: In backpropagation, weights are updated through gradient descent. The learning rate controls the step size of the weight update.


K-Means & EM Algorithm for Clustering

  • K-Means Algorithm: An iterative algorithm to partition data into K clusters by minimizing the within-cluster variance. It’s based on computing the centroid of each cluster and assigning points to the closest centroid.

  • Expectation-Maximization (EM): A more general framework for clustering, often used with Gaussian Mixture Models (GMM). EM alternates between two steps:

    • Expectation (E-step): Compute the probabilities of each data point belonging to each cluster using the current parameters.

    • Maximization (M-step): Update the parameters (e.g., cluster means, covariances) to maximize the likelihood of the observed data.


Neural Networks & Perceptron

  • Artificial Neural Network (ANN): A computational model inspired by biological neurons, where each neuron computes a weighted sum of its inputs and passes it through an activation function.

  • Perceptron: A type of artificial neuron used for binary classification. It’s the simplest type of neural network and works by establishing a decision boundary (hyperplane).

  • Limitations of MLP (Multilayer Perceptrons): MLPs can struggle with certain types of problems, especially if the relationships in data are highly non-linear or complex.


Hyperplane and Constraints

  • Hyperplane: In machine learning, a hyperplane is a decision boundary that separates different classes. It’s typically defined by the weights in the model.

  • Constraint Satisfaction: In the context of classification, constraint satisfaction involves determining which side of the boundary a data point belongs to (e.g., if a point lies above or below the hyperplane).


PyTorch & Nautilus

  • Nautilus: Likely refers to a platform or environment used for running machine learning models.

  • Pod vs Node:

    • Pod: A container that runs the application or code.

    • Node: A virtual machine or physical server where pods run. A node can contain multiple pods.


GAN Architecture

  • GAN (Generative Adversarial Network):

    • Generator: Generates fake samples from random noise, trying to mimic real data.

    • Discriminator: A classifier that learns to distinguish between real and fake samples. The generator and discriminator compete in a zero-sum game, improving each other over time.


Miscellaneous Concepts

  • Gradient Descent: The optimization algorithm used to minimize loss in neural networks. It adjusts the model’s weights based on the gradient of the loss function with respect to those weights.

  • Momentum: A technique to accelerate gradient descent by considering previous weight updates to smooth the learning trajectory.

  • Training Stopping: Training should stop when the validation loss starts to increase (indicating overfitting), or when the weights stop changing significantly.


Key Takeaways

  • Understand the mechanics of CNNs, including the role of convolution, filters, and pooling.

  • Familiarize yourself with the different activation functions, especially ReLU and sigmoid.

  • Know the steps involved in training a model, particularly backpropagation.

  • Be able to differentiate between K-means and EM algorithm for clustering.

  • Understand how to handle hyperplanes and constraint satisfaction in machine learning tasks.