Generative AI Exam 2

0.0(0)
studied byStudied by 0 people
0.0(0)
full-widthCall Kai
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
GameKnowt Play
Card Sorting

1/76

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

77 Terms

1
New cards

RAG purpose

Ground LLM outputs in external knowledge to reduce hallucinations

2
New cards

RAG pipeline

Index, retrieve, generate

3
New cards

Index steps

Ingest and clean, chunk, embed, and store (vector DB + metadata)

4
New cards

Ingest and clean

Convert to plain texts and clean so it's readable (extract text, fix broken encodings and headers/footers, capture source URL, last-updated, and owner as metadata)

5
New cards

Chunk

Segment text into digestible chunks with overlap (~200-500 token units with small overlaps and attached metadata)

6
New cards

Embed

Compute embeddings for each chunk (numerical representation that captures meaning)

7
New cards

Store

Vector database stores, indexes, and searches embeddings

8
New cards

Retrieval

Encode query, compute similarity scores, and retrieve top-K most relevant chunks to build context

9
New cards

Similarity scores between query vector and document chunks

Cosine similarity or dot product

10
New cards

Sparse approach

Token matches used for keyword queries and exact matches

11
New cards

Dense approach

Semantic vectors used for similarity search and semantic QA

12
New cards

Hybrid approach

Combines sparse and dense signals for improved recall and precision

13
New cards

Precision

If a model predicts a positive outcome, how likely is that prediction to be correct?

14
New cards

Recall

Given all relevant instances, how many did the model actually detect?

15
New cards

Generate prompt

Query + retrieved chunks + instructions

16
New cards

Hit rate

Fraction of queries where the correct document appears

17
New cards

Mean reciprocal rank (MRR)

How early does the first correct answer appear? (1/rank of first relevant result)

18
New cards

Normalized discounted cumulative gain (NDCG)

Weighted rating of the entire ranking, not just the first hit

19
New cards

Exact matching (EM)/F1

Exact match and overlap of generated answers

20
New cards

Pitfalls of oversized chunks

Matches are vague and the context window is stuffed with extra fluff

21
New cards

Pitfalls of chunks that are too small

Passages lose meaning, the model gets fragmented without enough context

22
New cards

Purpose of post-training

Tailor outputs to domain needs (format, tone, policy, tool-use), cheaper than pre-training, works with RAG

23
New cards

Supervised Fine-Tuning (SFT)

Next-token cross-entropy on target responses (mask system/prompt as needed; length-normalize)

24
New cards

SFT data

instruction, response pairs, normalize templates, deduplication, filter unsafe/personally identifiable information (PII), tag metadata

25
New cards

Parameter-efficient fine tuning (PEFT)

Low-Rank Adaptation (LoRA) or Quantized LoRA (QLoRA) to train small, reusable modules on top of frozen base weights

26
New cards

RLHF (Reinforcement Learning from Human Feedback)

Reward model + PPO with KL regularization to a reference

27
New cards

Not studied (48)

You haven't studied these terms yet!

28
New cards

Select these 48

RLHF benefits

29
New cards

RLHF drawbacks

Higher operational complexity

30
New cards

LoRA

Inserts low-rank adapters into attention/MLP, trains only adapters causing massive parameter savings with small quality loss

31
New cards

QLoRA

4-bit quantized base + LoRA adapters to reduce memory further

32
New cards

Hyperparameters to reason about

Epochs, learning rate, warmup steps/ratio, weight decay, effective batch size (batch * grand accumulation)

33
New cards

Overfitting

Aggressive LR/epochs lead to repetition or echoing

34
New cards

SFT deliverables pattern

Saved adapters/tokenizer, prompt template, inference function for product integration

35
New cards

DPO (Direct Preference Optimization)

Logistic loss on log-probability gaps (chosen vs. rejected) with reference correction, beta controls preference strength

36
New cards

DPO data

Prompt, chosen, rejected , pairs; uses frozen reference policy

37
New cards

Reference policy

Compares gaps vs. frozen base/SFT model; beta sweeps are implementation-dependent

38
New cards

When to use DPO

complements SFT for subjective qualities (helpfulness, tone, refusals)

39
New cards

DPO evaluation

formatting adherence, pairwise win-rate, safety/jailbreak tests, business KPIs

40
New cards

DPO vs. RLHF

Avoids a separate reward model and PPO loop; turns alignment into a direct logistic objective

41
New cards

Typical workflow

SFT, collect preference pairs, DPO fine-tuning, evaluate HHH (helpful, honest, harmless) + task KPIs

42
New cards

Tooling

TRL's 'DPOTrainer' (with beta and other hyperparameters as in SFT) for pairwise preferences

43
New cards

AI agents

Goal-directed loops that plan, call tools, write/execute code, and refine using feedback

44
New cards

Sponsored By LinkedIn for Marketing

Watch & learn: Smash your lead & conversion goals with LinkedIn AdsNot all ad formats are created equal. Watch our free webinars to learn which are best for driving leads and conversions on LinkedIn: https://lnkd.in/d_wF8gpPLearn More

45
New cards

Vibe coding

Specify the "feel" and constraints of the solution (architecture, style, design rules, performance/latency targets, acceptance criteria) rather than line-by-line instructions, guide with examples and guardrails

46
New cards

Vibe coding workflow

Define vibe, provide scaffolds, add exemplars, set guardrails, iterate, log decisions, lock rules

47
New cards

Define vibe

Goals, constraints, tests, non-goals

48
New cards

Provide scaffolds

Interfaces, stubs, layout, freeze public APIs

49
New cards

Add examplars

Code/style snippets to follow/avoid

50
New cards

Set guardrails

Tests, types, linters, CI as hard checks

51
New cards

Iterate

Generate, run, review and refine in tight cycles

52
New cards

Log decisions

Record rationale and update criteria

53
New cards

Lock rules

CI/pre-commit to enforce vibe

54
New cards

Limitations of vibe coding

Ambiguity and drift, non-determinism under parallel edits, hallucinated APIs, style inconsistency, security/secret mishandling, missing refactors, dependency/version surprises

55
New cards

Steering vs. Fine-tuning (SFT/DPO)

Fine-tuning has more durable changes, but slower, coarse (changes weights) adn compute-intensive

56
New cards

Interpretation

Uses sparse autoencoders (SAEs) to untangle activations into human-named features (e.g., polite tone, numbers, lists), then inspects the residual stream encodings

57
New cards

Interpreting benefits

Enables bias audits, compliance checks, debugging, etc.

58
New cards

Steering

Control model behavior along interpretable axes without retraining

59
New cards

2 common steering methods

SAE feature steering (select a feature and increase/decrease activation) and steering vectors (activation addition adds a learned direction to the hidden state)

60
New cards

Steering benefits

Brand-tone control, toxicity reduction, truthfulness nudges, region/policy alignment, and rapid iteration without costly retraining

61
New cards

Residual stream

Residual connection combines output from the MLP layer

62
New cards

Autoencoder

Compresses and depresses activations with the goal of minimizing reconstruction loss

63
New cards

Benefits of SAE feature steering

Interpretability, transparency, fine-grained control, and knowledge audit

64
New cards

Steering vector in activation engineering

In place of a trained SAE, you can use contrasting prompts to be used on the residual stream

65
New cards

Alpha in steering vectors

Controls how hard and in which direction you push along the steering vector

66
New cards

Operational advantages of steering vector benefits

Instant, reversible control at inference, direct and reproduceable effect (less context/wWording sensitive than prompting) and easier than fine tuning (no labels, no weight updates, immediate rollback)

67
New cards

Fine-grained control of steering vector

Adjust alpha for strength and flip sign to reverse behavior, cheaper and more precise than fine-tuning which needs new hyperparameter runs and data curations for each adjustment

68
New cards

Interpretability and governance of steering vectors

Each vector is a contrast labelable in plain english; simple to review, share, and rollback

69
New cards

Steering vector applications

Customer support tone shift, safety/refusal and overclaiming controls for compliance, and brand voice and regional personalization at scale

70
New cards

Epoch

One full pass through the training dataset (higher = more aggressive learning)

71
New cards

Learning rate

How fast the model learns (higher = more aggressive learning)

72
New cards

Warmup steps/ratio

A short startup phase where learning speed gradually increases to avoid sudden shocks (more = more conservative training)

73
New cards

Weight decay

A small penalty that nudges weights toward zero to prevent overfitting (higher = more conservative training)

74
New cards

Effective batch size

The total number of examples used before each update (larger = more aggressive learning)

75
New cards

KL divergence

Used in RLHF to penalize diverging too far from the original model

76
New cards

PPO

Proximal Policy Optimization (reinforcement learning model used in RLHF)

77
New cards

```