Transformer and GPT Vocabulary Flashcards

0.0(0)
studied byStudied by 0 people
GameKnowt Play
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/20

flashcard set

Earn XP

Description and Tags

Vocabulary-style flashcards covering key terms and concepts from the transformer and GPT lecture notes.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

21 Terms

1
New cards

GPT

Generative Pretrained Transformer; a family of models that generate text by predicting the next token in a sequence, trained on massive data and capable of sequential word-by-word generation.

2
New cards

Transformer

A neural network architecture built around attention and feed-forward (MLP) blocks that processes input tokens to produce context-rich representations; core to the modern AI boom.

3
New cards

Token

A piece of input (word, subword, punctuation, or patch) that is mapped to a vector in the model’s embedding space.

4
New cards

Embedding

The process of converting a token into a high-dimensional vector; the embedding space encodes semantic relationships between tokens.

5
New cards

Embedding matrix (W_E)

The matrix that maps each token in the vocabulary to its initial embedding vector; GPT-3 example: 50,257 tokens × 12,288 dimensions.

6
New cards

Embedding dimension

The dimensionality of token vectors in the embedding space; GPT-3 uses 12,288 dimensions.

7
New cards

Unembedding matrix (W_U)

The final projection matrix that maps a context vector to logits over the vocabulary; essentially the transpose-oriented counterpart to the embedding matrix.

8
New cards

Vocabulary / Tokens

The set of all possible tokens the model can recognize; GPT-3 has about 50,257 tokens.

9
New cards

Context size

The number of tokens the model can process in one forward pass; GPT-3 has a context size of 2048.

10
New cards

Attention block

A mechanism that lets token representations communicate and update each other’s meanings based on context.

11
New cards

Multi-layer perceptron (MLP) / feed-forward layer

A per-token, parallel processing block that applies the same transformation to all tokens, without inter-token communication.

12
New cards

Weights / Parameters

Learned matrices and vectors that parameterize the model; GPT-3 has about 175 billion parameters organized into multiple matrix categories.

13
New cards

Matrix-vector multiplication

The fundamental computation in neural networks where a weight matrix multiplies an input vector to produce an output vector.

14
New cards

Dot product

A measure of alignment between two vectors; positive when directions align, zero when orthogonal, negative when opposite.

15
New cards

Softmax

A function that converts a vector of scores (logits) into a probability distribution by exponentiating and normalizing so all values sum to 1.

16
New cards

Temperature

A knob in softmax that controls distribution sharpness; higher temperature yields a broader distribution, lower temperature yields a peakier one (T=0 makes it pick the max almost always).

17
New cards

Logits

Raw, unnormalized scores produced before applying softmax for the next-token prediction.

18
New cards

System prompt

Initial context that defines the role of a chatbot or AI assistant and guides its responses.

19
New cards

DALL-E / Midjourney

Transformer-based image-generation tools that take text descriptions and generate images.

20
New cards

Backpropagation

The training algorithm used to propagate errors backward through the network to update weights.

21
New cards

Next-token prediction

The primary objective of many language models: predict the most likely next token given the preceding context, then sample or select accordingly.