1/22
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Who introduced the Transformer model in 2017?
Vaswani et al.
What is BERT used for?
Masked language modeling and classification tasks.
Name two families of large language models.
GPT and Llama families.
What is tokenization?
Breaking input text into subword units for efficient processing.
How are tokens converted into vectors?
Using embedding algorithms like word2vec.
What is positional encoding?
A method to incorporate the order of tokens into a sequence model.
How do RNNs handle sequential data?
By looping connections to process input one timestep at a time.
When do you compute loss only from the last unit of an RNN?
For tasks requiring a summary output, like classification.
What is a bidirectional RNN?
An RNN that processes input in both forward and backward directions.
How does a gated RNN differ from a standard one?
It uses gates to control which information is stored or discarded, enhancing memory capabilities.
What connects the encoder and decoder in this architecture?
A fixed-length context vector summarizing the input sequence.
Name a task where encoder-decoder architecture is used.
Language translation.
What does the attention mechanism calculate?
A weighted sum of information from previous tokens based on similarity.
What are query, key, and value vectors in attention?
Transformed representations of token vectors for computing relevance.
What is an attention head?
A unit that computes attention for a specific aspect of input.
What components are inside a transformer block?
Attention layers, feed-forward networks, and residual connections.
Why are residual connections used in transformer blocks?
To preserve the original input information alongside processed data.
What is the difference between masked and auto-regressive models?
Masked models (like BERT) consider all input tokens at once, while auto-regressive models (like GPT) process tokens sequentially.
What is the purpose of fine-tuning an LLM?
To adapt the model for specific tasks using a pre-trained base.
What is prompt engineering?
Crafting inputs to guide LLMs toward desired outputs.