Hands-On Large Language Models – Chapters 1-5 Review

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/44

Earn XP

Description and Tags

Flashcards covering key terms, concepts, models, algorithms, and techniques from Chapters 1–5 of “Hands-On Large Language Models.”

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

45 Terms

New cards

What is a Large Language Model (LLM)?

A neural network—usually Transformer-based—with many parameters that is pretrained on massive text corpora to understand and generate human language.

New cards

Name the three main families of Transformer models discussed.

Encoder-only (representation, e.g., BERT), decoder-only (generative, e.g., GPT), and encoder-decoder (seq-to-seq, e.g., T5).

New cards

Which 2017 paper introduced the Transformer architecture?

“Attention Is All You Need.”

New cards

Why did GPT-2 (2019) cause a stir?

It could generate human-like text and marked the first widely publicised generative LLM, showing the power of scaling parameters (1.5 B).

New cards

Define ‘representation model’.

An encoder-only model that focuses on producing embeddings or intermediate representations of text rather than generating it.

New cards

Define ‘generative model’.

A decoder-only (or encoder-decoder) model that autocompletes or creates new text token by token.

New cards

Bag-of-Words limitation

Ignores word order and semantics, treating documents as unordered token counts.

New cards

What are embeddings?

Dense vectors that capture semantic properties of tokens, sentences, or documents.

New cards

How does word2vec learn embeddings?

By predicting neighbouring words (skip-gram / CBOW) using neural networks with negative sampling.

New cards

Static vs contextual embeddings

Static (word2vec) assigns one vector per word; contextual (BERT) changes the vector based on sentence context.

New cards

Purpose of attention mechanism

Allows the model to focus on relevant parts of the input when computing representations or generating output.

New cards

Self-Attention

Attention applied within a single sequence, letting every token attend to all others.

New cards

Multi-Head Attention

Parallel attention heads that capture different relational patterns between tokens.

New cards

Tokenization

Process of splitting raw text into tokens that the model’s vocabulary can handle.

New cards

Subword tokenization advantage

Balances vocabulary size and ability to represent rare or unseen words by splitting them into meaningful chunks.

New cards

Byte Pair Encoding (BPE)

Popular subword algorithm that merges frequent symbol pairs iteratively.

New cards

Context window (context length)

Maximum number of tokens an LLM can process in a single pass.

New cards

Why are GPUs important for LLMs?

They accelerate matrix operations needed for training/inference; VRAM limits model size.

New cards

Difference between open and proprietary LLMs

Open models release weights/architecture (e.g., Llama 2); proprietary models stay behind an API (e.g., GPT-4).

New cards

Two-step training paradigm for LLMs

(1) Pretraining on large unlabeled text, (2) fine-tuning/alignment on task-specific or preference data.

New cards

Masked Language Modeling (MLM)

Pretraining task where the model predicts masked tokens (used in BERT).

New cards

Instruction fine-tuning

Teaching a model to follow natural-language instructions by training on (prompt, desired answer) pairs.

New cards

Reinforcement Learning from Human Feedback (RLHF)

Method that ranks model outputs and trains a reward model to align LLM behaviour with human preferences.

New cards

Primary use cases of LLMs

Text generation, translation, summarisation, classification, code assistance, semantic search, chatbots, etc.

New cards

Retrieval-Augmented Generation (RAG)

Technique that injects external documents into the prompt to supply up-to-date or domain knowledge.

New cards

Ethical concerns around LLMs

Bias, hallucination, harmful content, intellectual-property questions, transparency, regulation.

New cards

What is UMAP used for in text clustering?

Reducing high-dimensional embeddings to lower dimensions while preserving structure for clustering/visualisation.

New cards

HDBSCAN role in clustering pipeline

Density-based algorithm that groups similar documents and labels outliers without pre-setting cluster count.

New cards

c-TF-IDF in BERTopic

Class-based TF-IDF that weights terms by importance within each cluster (topic) across the corpus.

New cards

KeyBERTInspired representation

Reranks topic words by comparing candidate-word embeddings with average document embeddings per topic.

New cards

Maximal Marginal Relevance (MMR)

Diversifies topic keywords by balancing relevance and redundancy.

New cards

Prompt Engineering

Crafting and iteratively refining instructions to guide generative LLMs toward desired outputs.

New cards

Zero-shot classification via embeddings

Assigning labels by comparing document embeddings with label-description embeddings using cosine similarity—no training data needed.

New cards

SentenceTransformer library

Python package that wraps Transformer models for easy embedding generation of sentences/documents.

New cards

Flash Attention purpose

Optimised GPU kernel that speeds attention computation by reducing memory traffic.

New cards

Grouped-Query Attention (GQA)

Efficiency improvement where sets of heads share key/value projections to lower memory during inference (used in Llama 2/3).

New cards

Rotary Positional Embeddings (RoPE)

Technique that encodes positions as rotations in embedding space, enabling longer context and packed training.

New cards

Why is dimensionality reduction helpful before clustering?

Mitigates the curse of dimensionality and reduces noise, making density or distance measures more meaningful.

New cards

Difference between PCA and UMAP

PCA is linear, optimises variance; UMAP is non-linear, preserves local and global manifold structure.

New cards

Major limitation of bag-of-words for topic modeling

Cannot capture synonymy, polysemy, or word order; purely frequency-based.

New cards

How does BERTopic label topics with LLMs?

Feeds representative documents and keywords into a generative model (e.g., GPT-3.5) to output a concise topic name.

New cards

GPU-poor workaround

Use quantised smaller models, external APIs, or run inference on free Colab T4 with 16 GB VRAM.

New cards

Sparse attention motivation

Scale Transformers to longer sequences by limiting each token’s attention scope, reducing quadratic cost.

New cards

Word vs character vs byte tokens—impact on context

Smaller tokens (chars/bytes) allow OOV handling but inflate sequence length, reducing effective context window.

New cards

Why are open-source frameworks like Hugging Face important?

Provide model zoo, tokenizers, training/inference utilities, fostering reproducibility and experimentation.