Vision Transformers Lecture Review

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/12

Earn XP

Description and Tags

Flashcards to review key terms and concepts related to Vision Transformers and deep learning.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

13 Terms

New cards

LSTM

Long Short-Term Memory; a type of recurrent neural network architecture designed to remember information for long periods.

New cards

CNN

Convolutional Neural Network; a deep learning algorithm commonly used for processing structured grid data, particularly images.

New cards

Attention Mechanism

A process that allows neural networks to focus on specific parts of the input when producing an output, improving performance in tasks like image captioning.

New cards

Vision Transformer (ViT)

A model architecture that applies the transformer approach to vision tasks by splitting images into patches and processing them similarly to sequences in language models.

New cards

Context Vector

A vector that encapsulates information from previous inputs, often used in the decoder of sequence models.

New cards

Image Captioning

The process of generating textual descriptions for given images using neural networks.

New cards

Positional Embeddings

Additional information added to input embeddings to indicate the position of tokens or patches in a sequence or image.

New cards

Alignment Scores

Scores that measure how well different parts of the input data correspond to each other, used to compute attention weights.

New cards

Softmax Function

A mathematical function that converts raw scores into probabilities, used in various neural network architectures for multi-class classification.

New cards

GRU

Gated Recurrent Unit; a type of recurrent neural network that is a simplified version of LSTM.

New cards

Pre-trained Model

A model that has already been trained on a large dataset and can be fine-tuned for specific tasks.

New cards

Attention Weights

The weights obtained from the attention mechanism that dictate the importance of different parts of the input data at each time step.

New cards

Encoder-Decoder Architecture

A model framework used in sequence processing where an encoder processes input data to a vector, and a decoder generates output from that vector.