1/50
Vocabulary flashcards covering essential terms from the Unit-5 lecture on Generative Models for Text, Transformers, BERT, GPT, prompt engineering, and LLM issues.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Large Language Model (LLM)
A deep-learning system, usually Transformer-based, trained on massive text corpora to understand, generate, and manipulate human language across many tasks.
Language Model
An AI model that assigns probabilities to sequences of words, enabling applications like text prediction, translation, summarization, and Q&A.
Statistical Language Model (SLM)
Early language model that estimates word sequence probabilities with statistical methods such as n-grams.
Neural Network-based Language Model
Model that uses neural networks (e.g., RNNs, LSTMs) to capture complex word dependencies beyond simple statistics.
Transformer-based Language Model
State-of-the-art model that employs the Transformer architecture and self-attention to process sequences in parallel and capture long-range context.
Recurrent Neural Network (RNN)
Neural architecture that processes sequences step-by-step, carrying hidden states but struggling with long-term dependencies.
Long Short-Term Memory (LSTM)
RNN variant with gating mechanisms designed to remember information over longer sequences.
Transformer
Neural architecture built on self-attention and feed-forward layers, enabling parallel processing of tokens and superior context handling.
GPT (Generative Pre-trained Transformer)
Decoder-only Transformer model trained with causal language modeling to predict the next token and generate coherent text.
BERT (Bidirectional Encoder Representations from Transformers)
Encoder-only Transformer pre-trained with Masked Language Modeling and Next Sentence Prediction for deep bidirectional context understanding.
Token
The basic unit (word, sub-word, or character) on which a language model operates after text is tokenized.
Embedding
Dense vector representation of a token that captures semantic and syntactic information in continuous space.
Frequency-based Embedding
Vector derived from word occurrence statistics, e.g., Bag of Words, TF-IDF, or co-occurrence matrices.
Prediction-based Embedding
Vector learned by training a model to predict a word from its context (or vice versa); examples include Word2Vec and GloVe.
Contextualized Word Embedding
Dynamic vector that changes with sentence context, produced by models like ELMo, BERT, or GPT.
Word2Vec
Prediction-based embedding model using CBOW or Skip-gram to learn word vectors from local context.
GloVe (Global Vectors)
Embedding method that factorizes a global word co-occurrence matrix to capture statistical information.
FastText
Embedding model that represents words as bags of character n-grams, improving handling of rare or morphologically rich words.
ELMo (Embeddings from Language Models)
Deep bi-directional LSTM model that produces context-sensitive word embeddings by considering entire sentence context.
Bag of Words (Count Vectorization)
Frequency-based representation where each document is a vector of raw word counts.
TF-IDF
Weighting scheme that scales word counts by inverse document frequency to emphasize informative terms.
Co-occurrence Matrix
Square matrix counting how often words appear near each other within a selected window size.
Continuous Bag of Words (CBOW)
Word2Vec variant that predicts a target word from its surrounding context words.
Skip-gram
Word2Vec variant that predicts surrounding context words from a single target word.
Self-Attention
Mechanism where each token attends to all other tokens in the same sequence to compute contextual representations.
Cross-Attention
Decoder mechanism that attends to encoder outputs, aligning generated tokens with source inputs.
Query-Key-Value (QKV)
Triplet of vectors used in attention: queries compare with keys to produce weights that are applied to values.
Multi-Head Attention
Parallel set of attention heads allowing a model to capture different relationship types simultaneously.
Encoder (in Transformers)
Stack of self-attention and feed-forward layers that converts input sequence into contextual representations.
Decoder (in Transformers)
Stack that generates output tokens using masked self-attention plus cross-attention to encoder representations.
Causal Language Modeling (CLM)
Training objective where a model predicts the next token using only left-context; basis for GPT-style models.
Masked Language Modeling (MLM)
Pre-training task where random tokens are masked and the model predicts them from both left and right context.
Next Sentence Prediction (NSP)
BERT pre-training task where the model classifies whether two sentences are sequential in the original text.
ChatGPT
OpenAI conversational system based on GPT-3.5/4, further tuned with supervised fine-tuning and RLHF for dialogue.
Prompt
User-provided text or instruction that guides an AI model to perform a specific task.
Prompt Engineering
Process of designing, testing, and refining prompts to elicit desired responses from language models.
Zero-Shot Learning Prompt
Prompting method where the model receives only task instructions with no examples.
One-Shot Learning Prompt
Prompting method that includes a single example to illustrate the desired output format.
Few-Shot Learning Prompt
Prompting method that supplies several examples (typically 2–5) to guide the model’s response.
Chain-of-Thought Prompting
Technique that asks the model to reveal intermediate reasoning steps to improve complex task performance.
Iterative Prompting
Strategy of refining a prompt over multiple rounds based on previous outputs to converge on better results.
Negative Prompting
Prompting approach that explicitly states what content or style should be avoided in the response.
Hybrid Prompting
Combining multiple prompting techniques (e.g., few-shot + chain-of-thought) to optimize outputs.
Prompt Chaining
Breaking a complex task into sequential prompts where each output feeds into the next step.
Hallucination (in LLMs)
Phenomenon where a language model generates confident but factually incorrect or fabricated content.
Bias (in LLMs)
Tendency of a model’s outputs to reflect societal or data-driven stereotypes and unfair representations.
Language AI
Subfield of artificial intelligence dedicated to understanding, processing, and generating human language.
Foundation Model
Large, versatile model pre-trained on broad data that can be fine-tuned for many downstream tasks.
Residual Connection
Shortcut that adds layer input to its output, stabilizing deep Transformer training.
Layer Normalization
Normalization technique applied across features in each layer to improve training stability in Transformers.
GeLU Activation
Gaussian Error Linear Unit—non-linear function commonly used in Transformer feed-forward networks.