Comprehensive Guide to Mixture of Experts, RAG, Quantization, and Fine-Tuning in Large Language Models

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/34

There's no tags or description

Looks like no tags are added yet.

Last updated 3:58 AM on 10/8/25

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

35 Terms

New cards

Mixture of Experts (MoE)

A neural architecture that activates only a subset of "expert" networks per input, enabling large capacity with reduced compute per token.

New cards

MoE Gating Network

A small model that decides which experts to activate for each token, often using top-k routing or softmax-based scores.

New cards

MoE Load Balancing

Regularization technique that prevents expert overuse or underuse by encouraging uniform routing.

New cards

Dense vs Sparse MoE

Dense uses all experts per input; Sparse activates a few, saving compute but increasing routing complexity.

New cards

MoE Trade-offs

High capacity and efficiency, but adds memory, routing overhead, and training instability.

New cards

Retrieval-Augmented Generation (RAG)

A system that retrieves relevant documents from a knowledge base before generation to ground outputs and reduce hallucination.

New cards

RAG Components

Retriever (finds documents) + Generator (LLM that conditions on retrieved context).

New cards

RAG Challenges

Hallucinations, irrelevant retrievals, context window limits, and inconsistent document grounding.

New cards

Hybrid Retrieval

Combines dense (vector-based) and sparse (keyword-based) retrieval for better coverage and precision.

New cards

Improving RAG Accuracy

Use re-ranking, better chunking, retrieval fine-tuning, and instruct model to cite sources.

New cards

Quantization

Representing model weights and activations in lower precision (e.g. 8-bit or 4-bit) to reduce memory and speed up inference.

New cards

Post-Training Quantization (PTQ)

Quantize a pre-trained model without retraining; fast but can reduce accuracy.

New cards

Quantization-Aware Training (QAT)

Simulate quantization during training to maintain accuracy after conversion.

New cards

Symmetric vs Asymmetric Quantization

Symmetric uses zero-centered scaling; asymmetric includes offset to handle non-zero means.

New cards

Quantization Trade-offs

Improves speed and efficiency but risks numerical instability and performance loss.

New cards

Knowledge Distillation

Training a smaller student model to mimic a larger teacher's outputs, reducing size and latency.

New cards

Distillation Loss

Combines student's task loss with KL divergence between teacher and student logits.

New cards

Intermediate Layer Distillation

Matching hidden states or attention maps between teacher and student for richer transfer.

New cards

Distillation Benefits

Retains performance while reducing model size, latency, and energy consumption.

New cards

Distillation Limitations

May underperform if teacher outputs are noisy or student is too small.

New cards

Attention Mechanism

Computes weighted combinations of values based on query-key similarity, allowing models to focus on relevant tokens.

New cards

Self-Attention

Each token attends to every other token in the sequence, capturing global dependencies.

New cards

Multi-Head Attention

Multiple attention heads learn diverse relationships in parallel subspaces.

New cards

Positional Encoding

Injects sequence order information since attention is permutation-invariant.

New cards

Attention Complexity

O(n²) time and memory with sequence length; motivates efficient attention variants.

New cards

Fine-Tuning

Adapting a pre-trained model to new data or tasks by updating its weights or small subsets of parameters.

New cards

Full Fine-Tuning

Updates all weights; high flexibility but expensive and risks overfitting.

New cards

LoRA (Low-Rank Adaptation)

Fine-tunes low-rank matrices added to weight layers; efficient and reversible.

New cards

Adapter Tuning

Adds small trainable modules between layers while freezing the backbone model.

New cards

Prefix or Prompt Tuning

Learns task-specific prompt vectors without modifying model weights.

New cards

Prompt Engineering

Crafting effective input prompts to elicit desired outputs from LLMs without retraining.

New cards

Chain-of-Thought Prompting

Encouraging step-by-step reasoning to improve problem-solving accuracy.

New cards

Prompt Optimization

Automating prompt improvement via gradient-based or search-based methods.

New cards

Prompt Injection

Adversarial input that manipulates model behavior; mitigated by input sanitization and filtering.

New cards

Prompt Engineering Goal

Maximize LLM accuracy, faithfulness, and control using structure, context, and examples.

Explore top notes

GRE Prep Advice and Overview

Updated 1301d ago

Note

20 Safety Precautions

Note

Note

Note

Chapter 17: Regional Injuries

Updated 1044d ago

Note

Achievements of Reconstruction

Updated 1166d ago

Note

Unit 4: Political Patterns and Processes

Updated 654d ago

Note

AP World Timeline

Updated 67d ago

Note

GRE Prep Advice and Overview

Updated 1301d ago

Note

20 Safety Precautions

Note

Note

Note

Chapter 17: Regional Injuries

Updated 1044d ago

Note

Achievements of Reconstruction

Updated 1166d ago

Note

Unit 4: Political Patterns and Processes

Updated 654d ago

Note

AP World Timeline

Updated 67d ago

Note

Explore top flashcards

EXAMEN CIENCIAS DE LA TIERRA 2

Updated 892d ago

Flashcards (69)

Photosynthesis

Updated 246d ago

Flashcards (73)

MedTerm.Chapter.2-Human Body in Health and Disease

Updated 1226d ago

Flashcards (65)

Applied Images: The Shoulder (Exam 2)

Flashcards (70)

Flashcards (43)

Flashcards (85)

Flashcards (24)

Flashcards (20)

EXAMEN CIENCIAS DE LA TIERRA 2

Updated 892d ago

Flashcards (69)

Photosynthesis

Updated 246d ago

Flashcards (73)

MedTerm.Chapter.2-Human Body in Health and Disease

Updated 1226d ago

Flashcards (65)

Applied Images: The Shoulder (Exam 2)

Flashcards (70)

Flashcards (43)

Flashcards (85)

Flashcards (24)

Flashcards (20)