1/12
Flashcards to review key terms and concepts related to Vision Transformers and deep learning.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
LSTM
Long Short-Term Memory; a type of recurrent neural network architecture designed to remember information for long periods.
CNN
Convolutional Neural Network; a deep learning algorithm commonly used for processing structured grid data, particularly images.
Attention Mechanism
A process that allows neural networks to focus on specific parts of the input when producing an output, improving performance in tasks like image captioning.
Vision Transformer (ViT)
A model architecture that applies the transformer approach to vision tasks by splitting images into patches and processing them similarly to sequences in language models.
Context Vector
A vector that encapsulates information from previous inputs, often used in the decoder of sequence models.
Image Captioning
The process of generating textual descriptions for given images using neural networks.
Positional Embeddings
Additional information added to input embeddings to indicate the position of tokens or patches in a sequence or image.
Alignment Scores
Scores that measure how well different parts of the input data correspond to each other, used to compute attention weights.
Softmax Function
A mathematical function that converts raw scores into probabilities, used in various neural network architectures for multi-class classification.
GRU
Gated Recurrent Unit; a type of recurrent neural network that is a simplified version of LSTM.
Pre-trained Model
A model that has already been trained on a large dataset and can be fine-tuned for specific tasks.
Attention Weights
The weights obtained from the attention mechanism that dictate the importance of different parts of the input data at each time step.
Encoder-Decoder Architecture
A model framework used in sequence processing where an encoder processes input data to a vector, and a decoder generates output from that vector.