DAT 255 Deep learning - Lecture 16 Transformers for natural language processing

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/3

flashcard set

Earn XP

Description and Tags

Flashcards about Transformers for NLP

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

4 Terms

1
New cards

How is text generated from a decoder model?

Process sequence and predict next token, Append predicted token to the sequence, Process extended sequence and again append predicted token, Repeat (Stop if predicting end-of-sequence token)

<p>Process sequence and predict next token, Append predicted token to the sequence, Process extended sequence and again append predicted token, Repeat (Stop if predicting end-of-sequence token)</p>
2
New cards

How does Beam search work and what are its limitations?

Keep track of several possible branches of output sequences, and select the sentence with highest probability, but is computationally expensive and can get stuck in loops.

3
New cards

How does Top-K sampling work?

Sample among the K tokens with the highest score.

4
New cards

How does Adjusted Softmax sampling work?

Add a parameter T called temperature in the softmax function. Low T samples among best tokens, High T samples among all tokens making output more random.