1/147
Key terms, acronyms and concepts you must know to understand and discuss fine-tuning, deploying and safeguarding Large Language Models.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Large Language Model (LLM)
A neural network with billions of parameters trained on vast text corpora to understand and generate human-like language.
Fine-Tuning
Further training a pre-trained model on a smaller, domain-specific dataset to specialize it for a new task.
Parameter-Efficient Fine-Tuning (PEFT)
Any technique that adapts an LLM by updating only a small subset of parameters, reducing compute and memory costs.
Low-Rank Adaptation (LoRA)
A PEFT method that inserts small low-rank matrices into weight layers and trains only these matrices while freezing the original weights.
QLoRA
A memory-efficient variant of LoRA that fine-tunes 4-bit quantised weights while training low-rank adapters.
DoRA
Weight-Decomposed Low-Rank Adaptation; splits weights into magnitude and direction, updating only low-rank directional components.
Half Fine-Tuning (HFT)
Technique that freezes half of a model’s parameters each round, preserving pre-trained knowledge while learning new tasks.
Mixture of Experts (MoE)
Architecture with multiple specialist sub-networks (experts); a router activates only a subset per token for efficiency.
Mixture of Agents (MoA)
Framework where several complete LLMs collaborate layer-by-layer, combining proposals and aggregations to improve output quality.
Retrieval-Augmented Generation (RAG)
Pipeline that retrieves external documents at query time and feeds them into an LLM to ground responses in fresh knowledge.
Seven-Stage Fine-Tuning Pipeline
End-to-end process: Dataset Preparation, Model Initialization, Training Setup, Fine-Tuning, Evaluation, Deployment, Monitoring.
Dataset Preparation
Collecting, cleaning, formatting and splitting data plus handling imbalance, augmentation and annotation.
Model Initialization
Loading a pre-trained checkpoint and setting initial configurations before training or inference.
Training Environment Setup
Configuring hardware (GPU/TPU), software libraries, hyper-parameters, optimiser and loss functions for efficient training.
Hyper-parameter
A training setting (e.g., learning rate, batch size, epochs) chosen before training that governs model learning behaviour.
Gradient Descent
Optimisation algorithm updating weights by moving them opposite to the gradient of the loss function.
Stochastic Gradient Descent (SGD)
Gradient descent variant that updates weights using one (or few) training samples per step, adding randomness.
Mini-Batch Gradient Descent
Updates parameters with gradients computed on small batches, balancing stability and speed.
Adam Optimizer
Adaptive learning-rate optimiser that combines momentum and RMSprop ideas; widely used for LLM fine-tuning.
AdamW
Adam variant that decouples weight decay from gradient updates, improving regularisation for transformers.
Quantisation
Technique that stores weights/activations in lower precision (e.g., 8-bit, 4-bit) to cut memory and speed up inference.
Pruning
Removing unimportant weights, neurons or filters from a network to make it smaller and faster.
Cross-Entropy Loss
Primary objective for language models measuring divergence between predicted token distribution and true distribution.
Perplexity
Exponentiated cross-entropy; lower values mean the model is less ‘surprised’ and predicts text better.
Safety Benchmark
Suite of tests (e.g., DecodingTrust) that probe toxicity, bias, privacy, hallucination and adversarial robustness of LLMs.
Llama Guard
Meta’s safeguard model that classifies prompts and responses into risk categories to filter unsafe content.
Shield Gemma
Google Gemma-based moderation model that filters hate, violence, sexual and other harmful content across parameter scales.
WildGuard
Open-source multitask moderation model fine-tuned on adversarial datasets to detect harmful prompts, risky outputs and refusals.
Proximal Policy Optimisation (PPO)
Reinforcement-learning algorithm that aligns LLMs by maximising a learned reward while constraining policy updates.
Direct Preference Optimisation (DPO)
Alignment method that directly maximises the likelihood of preferred over rejected responses without a reward model.
Odds-Ratio Preference Optimisation (ORPO)
Single-stage objective that boosts preferred answers and penalises disfavoured ones via a log-odds loss.
Adapters
Small trainable layers inserted into a frozen model; only adapter weights are updated during fine-tuning.
Soft Prompt Tuning
PEFT technique that learns a short sequence of virtual tokens prepended to every input instead of changing weights.
Data Augmentation
Creating synthetic examples (e.g., back-translation, paraphrasing) to enlarge training data and improve robustness.
Synthetic Data Generation
Using LLMs to produce new labelled samples that resemble target-domain data for fine-tuning.
Data Imbalance
Unequal class distribution in a dataset; mitigated via over-/under-sampling, class-weighted losses or focal loss.
Federated Learning
Framework where models train across multiple devices holding local data, improving privacy by keeping data in place.
Differential Privacy
Mathematical guarantee that training procedure prevents leakage of individual data points via noise injection.
Fairness
Model characteristic of producing equitable performance across demographic groups, avoiding algorithmic bias.
Bias
Systematic error favouring certain outputs or groups, often inherited from training data.
Data Drift
Shift in input distribution over time that can degrade model performance post-deployment.
Influence Score
Metric estimating each training example’s effect on model predictions; useful for data pruning (e.g., DEFT).
Data-Efficient Fine-Tuning (DEFT)
Approach that prunes training data by influence and effort scores to fine-tune LLMs with minimal samples.
Sparse Fine-Tuning
Updating only a small set of high-impact parameters (e.g., SpIEL), reducing memory and compute cost.
Autotrain
HuggingFace web service that automates data prep, hyper-parameter search, fine-tuning and deployment.
Transformers Library
HuggingFace Python package providing pre-trained models, tokenisers and Trainer API for fine-tuning.
Trainer API
High-level class in transformers that abstracts training loops, evaluation and distributed training setup.
Optimum
HuggingFace toolkit that applies hardware-aware optimisation (quantisation, pruning, distillation) for efficient inference.
Amazon SageMaker JumpStart
AWS service offering ready LLMs and automated pipelines for fine-tuning and deploying on SageMaker.
Amazon Bedrock
Fully-managed AWS service giving API access to foundation models and tools for fine-tuning and RAG.
OpenAI Fine-Tuning API
Endpoint that lets users upload datasets and customise GPT-3.5/4 models via simple API calls.
NVIDIA NeMo
Framework and set of micro-services for training, customising and serving LLMs with GPU acceleration.
Generative AI
Field of AI focused on creating new content—text, code, images, audio—rather than just analysing data.
Multimodal LLM
Model that processes and generates across multiple modalities, e.g., text + images or audio.
Vision-Language Model (VLM)
Multimodal model jointly trained on images and text, enabling tasks like captioning and VQA.
Contrastive Learning
Technique that teaches models by bringing paired representations (e.g., image–text) closer and pushing mismatched ones apart.
CLIP
OpenAI’s contrastive model that aligns image and text embeddings, enabling zero-shot vision tasks.
AdapterFusion
Method that merges multiple task-specific adapters into a single adapter for improved multi-task performance.
Data Cleaning
Removing noise, errors and inconsistencies from raw data to improve fine-tuning quality.
Tokenizer
Algorithm that splits raw text (or audio) into discrete tokens usable by a language model.
PagedAttention
vLLM memory-management algorithm that stores key-value cache in paged blocks, reducing fragmentation.
vLLM
Inference engine using PagedAttention plus smart scheduling to serve LLMs with high throughput and low memory.
Petals
Decentralised framework that splits LLM layers across volunteer GPUs, enabling torrent-style inference/fine-tuning.
WebGPU
Browser API enabling GPU compute inside web apps, letting LLMs run locally via projects like WebLLM.
Quantised LLM
Model whose weights are stored in reduced precision (4/8-bit) to cut memory and accelerate inference.
Pruning Schedule
Planned strategy dictating when and how much of a model’s weights or neurons to prune during training.
Model Card
Standardised report documenting a model’s purpose, data, performance, limitations and ethical considerations.
GLUE Benchmark
Suite of nine NLP tasks used to gauge general language understanding of models.
MMLU
Massive Multitask Language Understanding; benchmark spanning 57 subjects that tests broad knowledge and reasoning.
DecodingTrust
Comprehensive framework assessing LLM trustworthiness in toxicity, bias, privacy, robustness and ethics.
LLM Guardrails
Intermediary policies or models that filter or rewrite prompts/responses to enforce safety and compliance.
Large Language Model (LLM)
A neural network with billions of parameters trained on vast text corpora to understand and generate human-like language.
Fine-Tuning
Further training a pre-trained model on a smaller, domain-specific dataset to specialize it for a new task.
Parameter-Efficient Fine-Tuning (PEFT)
Any technique that adapts an LLM by updating only a small subset of parameters, reducing compute and memory costs.
Low-Rank Adaptation (LoRA)
A PEFT method that inserts small low-rank matrices into weight layers and trains only these matrices while freezing the original weights.
QLoRA
A memory-efficient variant of LoRA that fine-tunes 4-bit quantised weights while training low-rank adapters.
DoRA
Weight-Decomposed Low-Rank Adaptation; splits weights into magnitude and direction, updating only low-rank directional components.
Half Fine-Tuning (HFT)
Technique that freezes half of a model’s parameters each round, preserving pre-trained knowledge while learning new tasks.
Mixture of Experts (MoE)
Architecture with multiple specialist sub-networks (experts); a router activates only a subset per token for efficiency.
Mixture of Agents (MoA)
Framework where several complete LLMs collaborate layer-by-layer, combining proposals and aggregations to improve output quality.
Retrieval-Augmented Generation (RAG)
Pipeline that retrieves external documents at query time and feeds them into an LLM to ground responses in fresh knowledge.
Seven-Stage Fine-Tuning Pipeline
End-to-end process: Dataset Preparation, Model Initialization, Training Setup, Fine-Tuning, Evaluation, Deployment, Monitoring.
Dataset Preparation
Collecting, cleaning, formatting and splitting data plus handling imbalance, augmentation and annotation.
Model Initialization
Loading a pre-trained checkpoint and setting initial configurations before training or inference.
Training Environment Setup
Configuring hardware (GPU/TPU), software libraries, hyper-parameters, optimiser and loss functions for efficient training.
Hyper-parameter
A training setting (e.g., learning rate, batch size, epochs) chosen before training that governs model learning behaviour.
Gradient Descent
Optimisation algorithm updating weights by moving them opposite to the gradient of the loss function.
Stochastic Gradient Descent (SGD)
Gradient descent variant that updates weights using one (or few) training samples per step, adding randomness.
Mini-Batch Gradient Descent
Updates parameters with gradients computed on small batches, balancing stability and speed.
Adam Optimizer
Adaptive learning-rate optimiser that combines momentum and RMSprop ideas; widely used for LLM fine-tuning.
AdamW
Adam variant that decouples weight decay from gradient updates, improving regularisation for transformers.
Quantisation
Technique that stores weights/activations in lower precision (e.g., 8-bit, 4-bit) to cut memory and speed up inference.
Pruning
Removing unimportant weights, neurons or filters from a network to make it smaller and faster.
Cross-Entropy Loss
Primary objective for language models measuring divergence between predicted token distribution and true distribution.
Perplexity
Exponentiated cross-entropy; lower values mean the model is less ‘surprised’ and predicts text better.
Safety Benchmark
Suite of tests (e.g., DecodingTrust) that probe toxicity, bias, privacy, hallucination and adversarial robustness of LLMs.
Llama Guard
Meta’s safeguard model that classifies prompts and responses into risk categories to filter unsafe content.
Shield Gemma
Google Gemma-based moderation model that filters hate, violence, sexual and other harmful content across parameter scales.
WildGuard
Open-source multitask moderation model fine-tuned on adversarial datasets to detect harmful prompts, risky outputs and refusals.
Proximal Policy Optimisation (PPO)
Reinforcement-learning algorithm that aligns LLMs by maximising a learned reward while constraining policy updates.