The Illustrated Transformer Vocabulary

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/17

flashcard set

Earn XP

Description and Tags

Flashcards about The Illustrated Transformer.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

18 Terms

1
New cards

The Transformer

A model that uses attention to boost the speed with which models can be trained and lends itself to parallelization.

2
New cards

Black Box

The component of the Transformer model that takes a sentence in one language and outputs its translation in another.

3
New cards

Encoding Component

The component of the Transformer model responsible for processing the input sequence, consisting of a stack of encoders.

4
New cards

Decoding Component

The component of the Transformer model that generates the output sequence, consisting of a stack of decoders.

5
New cards

Self-Attention Layer

A layer that helps the encoder look at other words in the input sentence as it encodes a specific word.

6
New cards

Word Embedding

Turning each input word into a vector using an embedding algorithm.

7
New cards

Query, Key, and Value Vectors

Vectors created from the embedding of each word, used in calculating self-attention.

8
New cards

Multi-Headed Attention

A mechanism that improves the performance of the attention layer by allowing the model to focus on different positions and providing multiple representation subspaces.

9
New cards

Positional Encoding

Vectors added to input embeddings to account for the order of words in the input sequence.

10
New cards

Residual Connection

A connection around each sub-layer in the encoder (self-attention, ffnn), followed by a layer-normalization step.

11
New cards

Encoder-Decoder Attention

Helps the decoder focus on appropriate places in the input sequence.

12
New cards

Linear Layer

A fully connected neural network that projects the vector produced by the stack of decoders into a logits vector.

13
New cards

Softmax Layer

Turns scores from the Linear layer into probabilities for each word in the vocabulary.

14
New cards

One-Hot Encoding

Indicates each word in our vocabulary.

15
New cards

Loss Function

A metric optimized during the training phase to achieve an accurate model.

16
New cards

Expected Output

A probability distribution indicating the word thanks.

17
New cards

Greedy Decoding

A decoding method where the word with the highest probability is selected at each step.

18
New cards

Beam Search

A decoding method that holds onto multiple top words and runs the model multiple times to find the best translation.