Fine-Tuning Large Language Models – Key Vocabulary

0.0(0)

Studied by 0 people

0.0(0)

Call with Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/147

Earn XP

Description and Tags

Key terms, acronyms and concepts you must know to understand and discuss fine-tuning, deploying and safeguarding Large Language Models.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No study sessions yet.

148 Terms

New cards

Large Language Model (LLM)

A neural network with billions of parameters trained on vast text corpora to understand and generate human-like language.

New cards

Fine-Tuning

Further training a pre-trained model on a smaller, domain-specific dataset to specialize it for a new task.

New cards

Parameter-Efficient Fine-Tuning (PEFT)

Any technique that adapts an LLM by updating only a small subset of parameters, reducing compute and memory costs.

New cards

Low-Rank Adaptation (LoRA)

A PEFT method that inserts small low-rank matrices into weight layers and trains only these matrices while freezing the original weights.

New cards

QLoRA

A memory-efficient variant of LoRA that fine-tunes 4-bit quantised weights while training low-rank adapters.

New cards

DoRA

Weight-Decomposed Low-Rank Adaptation; splits weights into magnitude and direction, updating only low-rank directional components.

New cards

Half Fine-Tuning (HFT)

Technique that freezes half of a model’s parameters each round, preserving pre-trained knowledge while learning new tasks.

New cards

Mixture of Experts (MoE)

Architecture with multiple specialist sub-networks (experts); a router activates only a subset per token for efficiency.

New cards

Mixture of Agents (MoA)

Framework where several complete LLMs collaborate layer-by-layer, combining proposals and aggregations to improve output quality.

New cards

Retrieval-Augmented Generation (RAG)

Pipeline that retrieves external documents at query time and feeds them into an LLM to ground responses in fresh knowledge.

New cards

Seven-Stage Fine-Tuning Pipeline

End-to-end process: Dataset Preparation, Model Initialization, Training Setup, Fine-Tuning, Evaluation, Deployment, Monitoring.

New cards

Dataset Preparation

Collecting, cleaning, formatting and splitting data plus handling imbalance, augmentation and annotation.

New cards

Model Initialization

Loading a pre-trained checkpoint and setting initial configurations before training or inference.

New cards

Training Environment Setup

Configuring hardware (GPU/TPU), software libraries, hyper-parameters, optimiser and loss functions for efficient training.

New cards

Hyper-parameter

A training setting (e.g., learning rate, batch size, epochs) chosen before training that governs model learning behaviour.

New cards

Gradient Descent

Optimisation algorithm updating weights by moving them opposite to the gradient of the loss function.

New cards

Stochastic Gradient Descent (SGD)

Gradient descent variant that updates weights using one (or few) training samples per step, adding randomness.

New cards

Mini-Batch Gradient Descent

Updates parameters with gradients computed on small batches, balancing stability and speed.

New cards

Adam Optimizer

Adaptive learning-rate optimiser that combines momentum and RMSprop ideas; widely used for LLM fine-tuning.

New cards

AdamW

Adam variant that decouples weight decay from gradient updates, improving regularisation for transformers.

New cards

Quantisation

Technique that stores weights/activations in lower precision (e.g., 8-bit, 4-bit) to cut memory and speed up inference.

New cards

Pruning

Removing unimportant weights, neurons or filters from a network to make it smaller and faster.

New cards

Cross-Entropy Loss

Primary objective for language models measuring divergence between predicted token distribution and true distribution.

New cards

Perplexity

Exponentiated cross-entropy; lower values mean the model is less ‘surprised’ and predicts text better.

New cards

Safety Benchmark

Suite of tests (e.g., DecodingTrust) that probe toxicity, bias, privacy, hallucination and adversarial robustness of LLMs.

New cards

Llama Guard

Meta’s safeguard model that classifies prompts and responses into risk categories to filter unsafe content.

New cards

Shield Gemma

Google Gemma-based moderation model that filters hate, violence, sexual and other harmful content across parameter scales.

New cards

WildGuard

Open-source multitask moderation model fine-tuned on adversarial datasets to detect harmful prompts, risky outputs and refusals.

New cards

Proximal Policy Optimisation (PPO)

Reinforcement-learning algorithm that aligns LLMs by maximising a learned reward while constraining policy updates.

New cards

Direct Preference Optimisation (DPO)

Alignment method that directly maximises the likelihood of preferred over rejected responses without a reward model.

New cards

Odds-Ratio Preference Optimisation (ORPO)

Single-stage objective that boosts preferred answers and penalises disfavoured ones via a log-odds loss.

New cards

Adapters

Small trainable layers inserted into a frozen model; only adapter weights are updated during fine-tuning.

New cards

Soft Prompt Tuning

PEFT technique that learns a short sequence of virtual tokens prepended to every input instead of changing weights.

New cards

Data Augmentation

Creating synthetic examples (e.g., back-translation, paraphrasing) to enlarge training data and improve robustness.

New cards

Synthetic Data Generation

Using LLMs to produce new labelled samples that resemble target-domain data for fine-tuning.

New cards

Data Imbalance

Unequal class distribution in a dataset; mitigated via over-/under-sampling, class-weighted losses or focal loss.

New cards

Federated Learning

Framework where models train across multiple devices holding local data, improving privacy by keeping data in place.

New cards

Differential Privacy

Mathematical guarantee that training procedure prevents leakage of individual data points via noise injection.

New cards

Fairness

Model characteristic of producing equitable performance across demographic groups, avoiding algorithmic bias.

New cards

Bias

Systematic error favouring certain outputs or groups, often inherited from training data.

New cards

Data Drift

Shift in input distribution over time that can degrade model performance post-deployment.

New cards

Influence Score

Metric estimating each training example’s effect on model predictions; useful for data pruning (e.g., DEFT).

New cards

Data-Efficient Fine-Tuning (DEFT)

Approach that prunes training data by influence and effort scores to fine-tune LLMs with minimal samples.

New cards

Sparse Fine-Tuning

Updating only a small set of high-impact parameters (e.g., SpIEL), reducing memory and compute cost.

New cards

Autotrain

HuggingFace web service that automates data prep, hyper-parameter search, fine-tuning and deployment.

New cards

Transformers Library

HuggingFace Python package providing pre-trained models, tokenisers and Trainer API for fine-tuning.

New cards

Trainer API

High-level class in transformers that abstracts training loops, evaluation and distributed training setup.

New cards

Optimum

HuggingFace toolkit that applies hardware-aware optimisation (quantisation, pruning, distillation) for efficient inference.

New cards

Amazon SageMaker JumpStart

AWS service offering ready LLMs and automated pipelines for fine-tuning and deploying on SageMaker.

New cards

Amazon Bedrock

Fully-managed AWS service giving API access to foundation models and tools for fine-tuning and RAG.

New cards

OpenAI Fine-Tuning API

Endpoint that lets users upload datasets and customise GPT-3.5/4 models via simple API calls.

New cards

NVIDIA NeMo

Framework and set of micro-services for training, customising and serving LLMs with GPU acceleration.

New cards

Generative AI

Field of AI focused on creating new content—text, code, images, audio—rather than just analysing data.

New cards

Multimodal LLM

Model that processes and generates across multiple modalities, e.g., text + images or audio.

New cards

Vision-Language Model (VLM)

Multimodal model jointly trained on images and text, enabling tasks like captioning and VQA.

New cards

Contrastive Learning

Technique that teaches models by bringing paired representations (e.g., image–text) closer and pushing mismatched ones apart.

New cards

CLIP

OpenAI’s contrastive model that aligns image and text embeddings, enabling zero-shot vision tasks.

New cards

AdapterFusion

Method that merges multiple task-specific adapters into a single adapter for improved multi-task performance.

New cards

Data Cleaning

Removing noise, errors and inconsistencies from raw data to improve fine-tuning quality.

New cards

Tokenizer

Algorithm that splits raw text (or audio) into discrete tokens usable by a language model.

New cards

PagedAttention

vLLM memory-management algorithm that stores key-value cache in paged blocks, reducing fragmentation.

New cards

vLLM

Inference engine using PagedAttention plus smart scheduling to serve LLMs with high throughput and low memory.

New cards

Petals

Decentralised framework that splits LLM layers across volunteer GPUs, enabling torrent-style inference/fine-tuning.

New cards

WebGPU

Browser API enabling GPU compute inside web apps, letting LLMs run locally via projects like WebLLM.

New cards

Quantised LLM

Model whose weights are stored in reduced precision (4/8-bit) to cut memory and accelerate inference.

New cards

Pruning Schedule

Planned strategy dictating when and how much of a model’s weights or neurons to prune during training.

New cards

Model Card

Standardised report documenting a model’s purpose, data, performance, limitations and ethical considerations.

New cards

GLUE Benchmark

Suite of nine NLP tasks used to gauge general language understanding of models.

New cards

MMLU

Massive Multitask Language Understanding; benchmark spanning 57 subjects that tests broad knowledge and reasoning.

New cards

DecodingTrust

Comprehensive framework assessing LLM trustworthiness in toxicity, bias, privacy, robustness and ethics.

New cards

LLM Guardrails

Intermediary policies or models that filter or rewrite prompts/responses to enforce safety and compliance.

New cards

Large Language Model (LLM)

A neural network with billions of parameters trained on vast text corpora to understand and generate human-like language.

New cards

Fine-Tuning

Further training a pre-trained model on a smaller, domain-specific dataset to specialize it for a new task.

New cards

Parameter-Efficient Fine-Tuning (PEFT)

Any technique that adapts an LLM by updating only a small subset of parameters, reducing compute and memory costs.

New cards

Low-Rank Adaptation (LoRA)

A PEFT method that inserts small low-rank matrices into weight layers and trains only these matrices while freezing the original weights.

New cards

QLoRA

A memory-efficient variant of LoRA that fine-tunes 4-bit quantised weights while training low-rank adapters.

New cards

DoRA

Weight-Decomposed Low-Rank Adaptation; splits weights into magnitude and direction, updating only low-rank directional components.

New cards

Half Fine-Tuning (HFT)

Technique that freezes half of a model’s parameters each round, preserving pre-trained knowledge while learning new tasks.

New cards

Mixture of Experts (MoE)

Architecture with multiple specialist sub-networks (experts); a router activates only a subset per token for efficiency.

New cards

Mixture of Agents (MoA)

Framework where several complete LLMs collaborate layer-by-layer, combining proposals and aggregations to improve output quality.

New cards

Retrieval-Augmented Generation (RAG)

Pipeline that retrieves external documents at query time and feeds them into an LLM to ground responses in fresh knowledge.

New cards

Seven-Stage Fine-Tuning Pipeline

End-to-end process: Dataset Preparation, Model Initialization, Training Setup, Fine-Tuning, Evaluation, Deployment, Monitoring.

New cards

Dataset Preparation

Collecting, cleaning, formatting and splitting data plus handling imbalance, augmentation and annotation.

New cards

Model Initialization

Loading a pre-trained checkpoint and setting initial configurations before training or inference.

New cards

Training Environment Setup

Configuring hardware (GPU/TPU), software libraries, hyper-parameters, optimiser and loss functions for efficient training.

New cards

Hyper-parameter

A training setting (e.g., learning rate, batch size, epochs) chosen before training that governs model learning behaviour.

New cards

Gradient Descent

Optimisation algorithm updating weights by moving them opposite to the gradient of the loss function.

New cards

Stochastic Gradient Descent (SGD)

Gradient descent variant that updates weights using one (or few) training samples per step, adding randomness.

New cards

Mini-Batch Gradient Descent

Updates parameters with gradients computed on small batches, balancing stability and speed.

New cards

Adam Optimizer

Adaptive learning-rate optimiser that combines momentum and RMSprop ideas; widely used for LLM fine-tuning.

New cards

AdamW

Adam variant that decouples weight decay from gradient updates, improving regularisation for transformers.

New cards

Quantisation

Technique that stores weights/activations in lower precision (e.g., 8-bit, 4-bit) to cut memory and speed up inference.

New cards

Pruning

Removing unimportant weights, neurons or filters from a network to make it smaller and faster.

New cards

Cross-Entropy Loss

Primary objective for language models measuring divergence between predicted token distribution and true distribution.

New cards

Perplexity

Exponentiated cross-entropy; lower values mean the model is less ‘surprised’ and predicts text better.

New cards

Safety Benchmark

Suite of tests (e.g., DecodingTrust) that probe toxicity, bias, privacy, hallucination and adversarial robustness of LLMs.

New cards

Llama Guard

Meta’s safeguard model that classifies prompts and responses into risk categories to filter unsafe content.

New cards

Shield Gemma

Google Gemma-based moderation model that filters hate, violence, sexual and other harmful content across parameter scales.

New cards

WildGuard

Open-source multitask moderation model fine-tuned on adversarial datasets to detect harmful prompts, risky outputs and refusals.

100

New cards

Proximal Policy Optimisation (PPO)

Reinforcement-learning algorithm that aligns LLMs by maximising a learned reward while constraining policy updates.