2. Transformers

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/4

encourage image

There's no tags or description

Looks like no tags are added yet.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

5 Terms

1
New cards

How much (GPU) RAM is needed for an LLM?

  • X billion parameters * 2 → GB of RAM

  • Plus some space for intermediate computations (gradients etc.).

F.e. 7b model → ~14 GB of RAM

2
New cards

What is Self-Attention in a Transformer?

  • When processing a token, how much should each other token contribute?

<ul><li><p>When processing a token, how much should each other token contribute?</p></li></ul><p></p>
3
New cards

What is Multi-headed Attention in a Transformer?

  • Multiple Attention patterns are calculated with distinct V, K and Q matrices.

  • Gives the model the capacity to learn many distinct ways in which context changes meaning of a token.

<ul><li><p>Multiple Attention patterns are calculated with distinct V, K and Q matrices.</p></li><li><p>Gives the model the capacity to learn many distinct ways in which context changes meaning of a token.</p></li></ul><p></p>
4
New cards

Which part of a Transformer model is computationally most expensive?

The self-attention mechanism, as it requires significant computations and memory for each input token based on interactions with all other tokens.

5
New cards

What is KV-Caching in a Transformer?

  • A way to improve the computational and memory cost of Attention.

  • The previous attention computation is cached and reused in the next.

<ul><li><p>A way to improve the computational and memory cost of Attention.</p></li><li><p>The previous <span style="color: yellow">attention computation is cached</span> and reused in the next.</p></li></ul><p></p>