1/54
Essential terms and definitions spanning AI foundations, deep-learning architectures, optimization, transformers, tokenisation, anomaly-detection algorithms and key cybersecurity applications.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Artificial Intelligence (AI)
The field devoted to creating systems able to emulate human reasoning, learning and problem-solving without hard-coded rules.
Machine Learning (ML)
A subset of AI that learns patterns from data to make decisions or predictions.
Neural Network (NN)
A data-driven model composed of interconnected layers of nodes (neurons) that process information via weighted links.
Perceptron
The simplest neural-network unit; computes a weighted sum of inputs, applies an activation function and outputs a value.
Activation Function
A non-linear function (e.g., ReLU, Sigmoid) applied to a neuron’s weighted sum to introduce non-linearity into the model.
ReLU (Rectified Linear Unit)
Activation: f(x)=max(0,x). Fast, non-saturating and widely used in deep nets.
Sigmoid
Activation squashing real numbers into (0,1); useful for probabilities but prone to vanishing gradients.
Deep Learning (DL)
Branch of ML using neural networks with three or more hidden layers to learn hierarchical features.
Shallow Learning
Traditional ML relying on manual feature engineering and simpler models with few layers.
Supervised Learning
ML paradigm using labelled data to learn a mapping from inputs to known outputs.
Unsupervised Learning
ML paradigm discovering hidden patterns in unlabelled data (e.g., clustering, anomaly detection).
Semi-Supervised Learning
Technique combining a small set of labelled data with a large set of unlabelled data to improve learning.
Self-Supervised Learning (SSL)
Learning approach that creates surrogate labels from the data itself (e.g., mask prediction) to learn representations without external labels.
Reinforcement Learning (RL)
Learning paradigm where an agent interacts with an environment, receiving rewards to learn optimal actions over time.
Federated Learning (FL)
Collaborative training of a shared global model from multiple decentralised devices without exchanging raw data.
Transfer Learning (TL)
Technique that adapts a model pre-trained on one task to a related but different task with little new data.
Generative Model
A model that learns the data distribution in order to create synthetic, previously unseen samples.
Adversarial Machine Learning
Field studying how ML models can be fooled (e.g., data poisoning, adversarial examples) and how to defend them.
Gradient Descent (GD)
Optimization algorithm that iteratively adjusts parameters in the direction opposite to the loss gradient.
Stochastic Gradient Descent (SGD)
GD variant updating parameters using randomly selected mini-batches, adding stochasticity and faster convergence.
Momentum
Optimization enhancement that accumulates past gradients to smooth updates and reduce oscillations.
ADAM
Adaptive Moment Estimation; optimizer combining momentum and per-parameter adaptive learning rates.
Batch Size
Number of samples processed before the model’s parameters are updated once.
Underfitting
Modeling error where a model is too simple to capture the underlying pattern, yielding high training loss.
Overfitting
Modeling error where a model fits training data too closely, performing poorly on unseen data.
Feed-Forward Neural Network (FFNN)
Network where information flows in one direction from input to output without cycles.
Recurrent Neural Network (RNN)
Network designed for sequence data, using hidden state to capture temporal dependencies.
Long Short-Term Memory (LSTM)
RNN variant with gated cells (input, forget, output) mitigating vanishing gradients and learning long-term dependencies.
Convolutional Neural Network (CNN)
Architecture using convolution, activation and pooling layers to extract hierarchical features, especially from images.
Graph Neural Network (GNN)
Model that propagates and aggregates information over graph structures to learn node, edge or graph embeddings.
Attention Mechanism
Neural component that weighs the importance of different inputs, enabling models to focus on relevant parts.
Transformer
Encoder-decoder architecture built on self-attention layers, enabling parallel sequence processing and long-range context.
Positional Encoding
Numerical encoding added to transformer inputs to inject information about token positions.
Tokenizer
Module that splits text into tokens, maps tokens to IDs, handles padding/truncation for transformer inputs.
Word2Vec
Technique that learns fixed, non-contextual word embeddings via Skip-Gram or CBOW objectives.
Skip-Gram Model
Word2Vec variant predicting surrounding words given a central word, learning embeddings from co-occurrence.
Masked Language Modeling (MLM)
Pre-training task (e.g., BERT) where random tokens are masked and the model predicts the original words.
Autoencoder (AE)
Neural network trained to reconstruct its input; encodes data to a latent space then decodes back.
Variational Autoencoder (VAE)
Probabilistic AE that learns a latent distribution, enabling generative sampling and regularised representations.
Denoising Autoencoder (DAE)
AE trained to reconstruct clean data from corrupted inputs, learning robust features.
Contrastive Learning
Self-supervised approach that pulls similar (positive) pairs together and pushes dissimilar (negative) pairs apart in embedding space.
Anomaly Detection
Process of identifying data instances that deviate significantly from normal patterns.
One-Class SVM
SVM variant that learns the boundary of normal data and flags points outside as anomalies.
K-Means
Partitional clustering algorithm assigning data to K clusters by minimizing within-cluster variance.
DBSCAN
Density-based clustering algorithm that groups dense regions and labels sparse points as noise/outliers.
Local Outlier Factor (LOF)
Algorithm identifying local density anomalies by comparing a point’s density to that of its neighbors.
Principal Component Analysis (PCA)
Linear dimensionality-reduction method projecting data onto orthogonal components with maximum variance.
t-SNE
Non-linear dimensionality-reduction technique that preserves local similarities for visualising high-dimensional data.
Reconstruction Error
Difference between original input and autoencoder output; high error often signals anomalies.
Latent Space
Compressed representation learned by an encoder, capturing salient features of input data.
Self-Attention
Attention applied within a single sequence, letting tokens weigh relevance of other tokens in the same sequence.
Cross-Attention
Attention where queries come from the decoder and keys/values from the encoder, enabling encoder-decoder interaction.
Causal Masking
Mask used in transformer decoders to prevent a position from attending to future tokens.
Tokenizer Vocabulary
Set of all tokens a tokenizer recognizes; size dictates embedding matrix dimensions.
Perplexity (t-SNE)
Parameter controlling balance between local and global structure when embedding data with t-SNE.