DAT 255 Deep learning - Lecture 16 Transformers for natural language processing

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/3

Earn XP

Description and Tags

Flashcards about Transformers for NLP

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

4 Terms

1

New cards

How is text generated from a decoder model?

Process sequence and predict next token, Append predicted token to the sequence, Process extended sequence and again append predicted token, Repeat (Stop if predicting end-of-sequence token)

<p>Process sequence and predict next token, Append predicted token to the sequence, Process extended sequence and again append predicted token, Repeat (Stop if predicting end-of-sequence token)</p>

2

New cards

How does Beam search work and what are its limitations?

Keep track of several possible branches of output sequences, and select the sentence with highest probability, but is computationally expensive and can get stuck in loops.

3

New cards

How does Top-K sampling work?

Sample among the K tokens with the highest score.

4

New cards

How does Adjusted Softmax sampling work?

Add a parameter T called temperature in the softmax function. Low T samples among best tokens, High T samples among all tokens making output more random.