1/15
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Why do we need Transformers for language tasks?
Because they can understand long‑range relationships in sentences without reading one word at a time.
What is the main weakness of RNNs that Transformers fix?
RNNs read text slowly and forget long‑range information, while Transformers process everything at once.
What is “attention” in a Transformer?
A way for the model to focus on the most important words when understanding a sentence.
Why is attention useful?
It helps the model figure out which words relate to each other, even if they’re far apart.
What are Query, Key, and Value vectors?
They’re the math tools attention uses to decide how strongly each word should pay attention to other words.
What is multi‑head attention?
Multiple attention mechanisms running at the same time, each focusing on different patterns or relationships.
Why do Transformers need positional encoding?
Because they read all words at once, they need a way to know the order of the words.
What does positional encoding represent?
It gives each word a “position tag” so the model knows where it appears in the sentence.
What do feed‑forward layers do in a Transformer?
They refine and transform the information after attention has figured out relationships.
Why do Transformers use residual connections?
To help information flow smoothly through the network and prevent it from getting lost.
What does layer normalization do?
It keeps the numbers stable during training so the model learns better.
What does the final linear + softmax layer do?
It turns the model’s output into a probability distribution for the next word.
What is “sampling” in language models?
Choosing the next word based on the probabilities the model predicts.
What does GPT stand for?
Generative Pre‑trained Transformer.
What is a Vision Transformer (ViT)?
A Transformer that treats image patches like words so it can understand pictures.
What are multimodal Transformers like VATT used for?
They process video, audio, and text together in one unified model.