DAT255: Deep learning - Lecture 15 Attention and Transformers

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/3

Earn XP

Description and Tags

Flashcards covering key concepts related to attention mechanisms and Transformers, based on lecture notes.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

4 Terms

1

New cards

What is the main concept behind Transformers?

To compute relations between vectors and use these relations to transform the vectors into new ones, creating a better suited representation for the task.This is achieved by utilizing self-attention mechanisms, allowing the model to weigh the influence of different input elements.

<p>To compute relations between vectors and use these relations to transform the vectors into new ones, creating a better suited representation for the task.This is achieved by utilizing self-attention mechanisms, allowing the model to weigh the influence of different input elements. </p>

2

New cards

What is the formula for scaled dot-product self-attention for a single head?

attention(Q, K, V) = softmax [QKT / √D] V

3

New cards

What additional layers are added to the multi-head attention to form a transformer block?

A residual connection (Add & Norm) around the attention layer, a stack of Dense (Feed Forward) layers, and a residual connection around the feed forward layers.