The Illustrated Transformer Vocabulary

0.0(0)

Studied by 0 people

View linked note

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/17

Earn XP

Description and Tags

Flashcards about The Illustrated Transformer.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

18 Terms

New cards

The Transformer

A model that uses attention to boost the speed with which models can be trained and lends itself to parallelization.

New cards

Black Box

The component of the Transformer model that takes a sentence in one language and outputs its translation in another.

New cards

Encoding Component

The component of the Transformer model responsible for processing the input sequence, consisting of a stack of encoders.

New cards

Decoding Component

The component of the Transformer model that generates the output sequence, consisting of a stack of decoders.

New cards

Self-Attention Layer

A layer that helps the encoder look at other words in the input sentence as it encodes a specific word.

New cards

Word Embedding

Turning each input word into a vector using an embedding algorithm.

New cards

Query, Key, and Value Vectors

Vectors created from the embedding of each word, used in calculating self-attention.

New cards

Multi-Headed Attention

A mechanism that improves the performance of the attention layer by allowing the model to focus on different positions and providing multiple representation subspaces.

New cards

Positional Encoding

Vectors added to input embeddings to account for the order of words in the input sequence.

New cards

Residual Connection

A connection around each sub-layer in the encoder (self-attention, ffnn), followed by a layer-normalization step.

New cards

Encoder-Decoder Attention

Helps the decoder focus on appropriate places in the input sequence.

New cards

Linear Layer

A fully connected neural network that projects the vector produced by the stack of decoders into a logits vector.

New cards

Softmax Layer

Turns scores from the Linear layer into probabilities for each word in the vocabulary.

New cards

One-Hot Encoding

Indicates each word in our vocabulary.

New cards

Loss Function

A metric optimized during the training phase to achieve an accurate model.

New cards

Expected Output

A probability distribution indicating the word thanks.

New cards

Greedy Decoding

A decoding method where the word with the highest probability is selected at each step.

New cards

Beam Search

A decoding method that holds onto multiple top words and runs the model multiple times to find the best translation.

Explore top notes

Unit 8: Clinical Psychology

Updated 887d ago

Note

Common Polyatomic Ions

Updated 1053d ago

Note

Memrise beginner/TTMIK level one

Updated 1088d ago

Note

Nth Term Test for Divergence

Updated 265d ago

Note

Chapter 3 - Biology and Behaviour

Updated 360d ago

Note

DNA Replication

Updated 1003d ago

Note

HL Cognitive approach to understanding behaviour

Updated 207d ago

Note

APUSH Unit 7: (1890-1945)

Updated 464d ago

Note

Explore top flashcards

USH Unit 1

Updated 732d ago

Flashcards (26)

AP World History: Modern 1750-1900s

Updated 1000d ago

Flashcards (127)

World Lit Semester 2 Exam

Updated 102d ago

Flashcards (33)

IB Bio More Second Year Ecology

Updated 869d ago

Flashcards (30)

Topic 1.1 - What Is A Business (SL/HL)

Updated 856d ago

Flashcards (49)

duits wörterliste 1 NL-DU tweede helft

Updated 174d ago

Flashcards (25)

AP Human Geography 3A Study Key 2.0

Updated 1018d ago

Flashcards (89)

WSC 2025: Speeches that Inspire, Speeches that Spit Fire

Updated 121d ago

Flashcards (128)