Looks like no one added any tags here yet for you.
Support Vector Machine
Supervised ML method that finds the best hyperplanes to categorize new examples.
Hyperplane
A plane of dimensionality lower than the data, such as a line for 2D data or a point for 1D data.
Linearly separable
Data that can be separated by a straight line.
Non-linearly separable
Data that cannot be separated by a straight line.
Margin
The distance from the hyperplane to the closest points, optimized in SVM.
Support Vectors
Points closest to the hyperplane that SVM considers in classification.
Dot Product
A way to project one vector onto another, used in SVM for classification.
Decision Rule
In SVM, finding (w, b) to maximize the distance d subject to constraints.
Soft Margins
In SVM, allowing some data points to fall on the wrong side of the line.
Hyperparameter
In SVM, 0 if class is correct, distance if class is incorrect.
Kernel Trick
Converting lower dimension space to a higher dimension space using functions in SVM.
Polynomial Kernel
A kernel in SVM with a hyperparameter d.
Radial Basis Function Kernel
A kernel in SVM creating non-linear combinations of features for separation.
Pros of SVMs
Effective in high dimensional spaces, memory-efficient, and works well with clear margins.
Cons of SVMs
Doesn't perform well with large datasets, overlapping classes, or provide direct probability estimates.
Decision Trees
An intuitive algorithm for classification based on splits.
Entropy
Measures the homogeneity of information in decision trees.
Information Gain
In decision trees, the benefit of a split based on an impurity function.
Boosting + Bagging
Methods to overcome weak learner limitations by combining models.
Boosting
Choosing the next learner based on the errors of the last learner, such as in gradient boosted decision trees.
Bagging
Stochastically choosing the next learners, as seen in random forests.
Boosted Decision Trees
A method that uses a learning rate to train new models based on errors of previous models.
Learning rate
A parameter that controls how much the weights of the model are adjusted with respect to the loss gradient.
New model
A model trained in boosting that learns from the errors of the previous model.
Previous model
The model that precedes the current model in boosting.
Sub model of errors
A model that focuses on the errors made by the previous model in boosting.
Loss
The measure of how well a model predicts the expected outcome.
Number of trees
The quantity of decision trees in an ensemble model.
Decision trees
Models that work by splitting data to maximize information gain.
Support Vector Machines (SVMs)
Powerful classical machine learning approaches for supervised learning that find an optimal hyperplane to separate data.
Neural Networks
Models that produce a single output from a matrix of inputs, weights, and biases.
Deep Neural Networks (DNN)
Neural networks with multiple layers that can learn complex patterns in data.
Activation functions
Functions that determine the output of a neural network.
Backpropagation
A method for adjusting the weights of a neural network based on the gradient of the loss function.
Optimisation
The process of adjusting the model to reduce errors and improve performance.
Regularisation
Techniques used to prevent overfitting in machine learning models.
Hidden layers
Layers in a neural network between the input and output layers where the complex patterns are learned.
Dense layers
Also known as fully connected layers, where each neuron is connected to every neuron in the previous layer.
Sequential (Neural Network Programming)
A way to program a neural network in TensorFlow/Keras that is quick and easy.
Functional (Neural Network Programming)
A more complex but flexible way to program a neural network in TensorFlow/Keras.
Vanishing gradient problem
A problem in neural networks where gradients become extremely small, hindering learning.
Back Propagation
The process of calculating the gradient of the loss function with respect to the neural network's parameters to update them efficiently.
Loss Function
A function that measures how well a neural network model predicts the expected outcome.
Stochastic Gradient Descent (SGD)
An optimization algorithm that calculates the gradient of the loss function per sample rather than on the entire batch.
Mini-batch SGD
An optimization technique that calculates the loss gradient on batches of a set size, combining benefits of both gradient descent methods.
Momentum
An optimization technique that reduces oscillations in the gradient descent process, aiding convergence.
Nesterov Accelerated Gradient
An optimization method that corrects the momentum direction to prevent overshooting the minimum during parameter updates.
Adaptive Methods
Optimization techniques that adjust learning rates for different parameters based on their update frequencies.
Adagrad
An adaptive optimization algorithm that modifies the learning rate at each time step based on past gradients computed for each parameter.
AdaDelta
An optimization method that stores gradients from a limited number of previous steps to prevent the continuous decay of the learning rate.
Adam
An optimization algorithm that combines features of AdaDelta and momentum to achieve efficient parameter updates in neural networks.
Dropout
A regularization technique that randomly sets a fraction of input units to zero during training to prevent overfitting.
Adding to Memory
Combining and storing new information
Forgetting Operation
Removing outdated information from memory
Updating Memory
Performing forgetting and adding operations on the memory