transformers/gen ai

0.0(0)

Studied by 0 people

Knowt Play

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/11

There's no tags or description

Looks like no tags are added yet.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

12 Terms

New cards

sequence modeling

decoder-only, GPT-style
learn to predict the next token in a sequence
ex: language modeling, music generation, etc.
formula: P(x) = P(x₁) * P(x₂ | x₁) * P(x_t | x_<t)

New cards

sequence-to-sequence (seq2seq)

encoder-decoder (original transformer), translation-style
input sequence (english sentence) → output sequence (german translation)
ex: translation, q&a, text-to-speech
formula: P(x|z) = P(x₁|z) * P(x₂|x₁, z)

New cards

classification

encoder-only, BERT-style
input = sequence of tokens → output = label/class
ex: sentiment analysis, spam detection
formula: learn P(c|x)

New cards

transformers

deep feed-forward neural networks that rely on attention mechanisms
general purpose sequence models with 3 main use cases:
- sequence modeling
- seq2seq
- classification

New cards

tokenization

process of representing text as tokens
subword tokenization is most common
each token converted to unique integer ID

New cards

token embedding

converts token ID → vector
like a dictionary lookup into a learned matrix

New cards

positional embedding

adds information about word order
without it, model sees text as ‘bag of words’
can be learned (finite length) or sinusoidal (infinite length)

New cards

final/vector embedding

token embedding + positional embedding

New cards

attention

decides which parts of the sequence to focus on

New cards

attention score

similarity between query and key

New cards

multi-head attention

Multiple attention mechanisms run in parallel, each capturing different relationships.
Outputs are concatenated and linearly combined

New cards