Unit-5 Generative Models for Text: Key Vocabulary

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/50

Earn XP

Description and Tags

Vocabulary flashcards covering essential terms from the Unit-5 lecture on Generative Models for Text, Transformers, BERT, GPT, prompt engineering, and LLM issues.

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

51 Terms

New cards

Large Language Model (LLM)

A deep-learning system, usually Transformer-based, trained on massive text corpora to understand, generate, and manipulate human language across many tasks.

New cards

Language Model

An AI model that assigns probabilities to sequences of words, enabling applications like text prediction, translation, summarization, and Q&A.

New cards

Statistical Language Model (SLM)

Early language model that estimates word sequence probabilities with statistical methods such as n-grams.

New cards

Neural Network-based Language Model

Model that uses neural networks (e.g., RNNs, LSTMs) to capture complex word dependencies beyond simple statistics.

New cards

Transformer-based Language Model

State-of-the-art model that employs the Transformer architecture and self-attention to process sequences in parallel and capture long-range context.

New cards

Recurrent Neural Network (RNN)

Neural architecture that processes sequences step-by-step, carrying hidden states but struggling with long-term dependencies.

New cards

Long Short-Term Memory (LSTM)

RNN variant with gating mechanisms designed to remember information over longer sequences.

New cards

Transformer

Neural architecture built on self-attention and feed-forward layers, enabling parallel processing of tokens and superior context handling.

New cards

GPT (Generative Pre-trained Transformer)

Decoder-only Transformer model trained with causal language modeling to predict the next token and generate coherent text.

New cards

BERT (Bidirectional Encoder Representations from Transformers)

Encoder-only Transformer pre-trained with Masked Language Modeling and Next Sentence Prediction for deep bidirectional context understanding.

New cards

Token

The basic unit (word, sub-word, or character) on which a language model operates after text is tokenized.

New cards

Embedding

Dense vector representation of a token that captures semantic and syntactic information in continuous space.

New cards

Frequency-based Embedding

Vector derived from word occurrence statistics, e.g., Bag of Words, TF-IDF, or co-occurrence matrices.

New cards

Prediction-based Embedding

Vector learned by training a model to predict a word from its context (or vice versa); examples include Word2Vec and GloVe.

New cards

Contextualized Word Embedding

Dynamic vector that changes with sentence context, produced by models like ELMo, BERT, or GPT.

New cards

Word2Vec

Prediction-based embedding model using CBOW or Skip-gram to learn word vectors from local context.

New cards

GloVe (Global Vectors)

Embedding method that factorizes a global word co-occurrence matrix to capture statistical information.

New cards

FastText

Embedding model that represents words as bags of character n-grams, improving handling of rare or morphologically rich words.

New cards

ELMo (Embeddings from Language Models)

Deep bi-directional LSTM model that produces context-sensitive word embeddings by considering entire sentence context.

New cards

Bag of Words (Count Vectorization)

Frequency-based representation where each document is a vector of raw word counts.

New cards

TF-IDF

Weighting scheme that scales word counts by inverse document frequency to emphasize informative terms.

New cards

Co-occurrence Matrix

Square matrix counting how often words appear near each other within a selected window size.

New cards

Continuous Bag of Words (CBOW)

Word2Vec variant that predicts a target word from its surrounding context words.

New cards

Skip-gram

Word2Vec variant that predicts surrounding context words from a single target word.

New cards

Self-Attention

Mechanism where each token attends to all other tokens in the same sequence to compute contextual representations.

New cards

Cross-Attention

Decoder mechanism that attends to encoder outputs, aligning generated tokens with source inputs.

New cards

Query-Key-Value (QKV)

Triplet of vectors used in attention: queries compare with keys to produce weights that are applied to values.

New cards

Multi-Head Attention

Parallel set of attention heads allowing a model to capture different relationship types simultaneously.

New cards

Encoder (in Transformers)

Stack of self-attention and feed-forward layers that converts input sequence into contextual representations.

New cards

Decoder (in Transformers)

Stack that generates output tokens using masked self-attention plus cross-attention to encoder representations.

New cards

Causal Language Modeling (CLM)

Training objective where a model predicts the next token using only left-context; basis for GPT-style models.

New cards

Masked Language Modeling (MLM)

Pre-training task where random tokens are masked and the model predicts them from both left and right context.

New cards

Next Sentence Prediction (NSP)

BERT pre-training task where the model classifies whether two sentences are sequential in the original text.

New cards

ChatGPT

OpenAI conversational system based on GPT-3.5/4, further tuned with supervised fine-tuning and RLHF for dialogue.

New cards

Prompt

User-provided text or instruction that guides an AI model to perform a specific task.

New cards

Prompt Engineering

Process of designing, testing, and refining prompts to elicit desired responses from language models.

New cards

Zero-Shot Learning Prompt

Prompting method where the model receives only task instructions with no examples.

New cards

One-Shot Learning Prompt

Prompting method that includes a single example to illustrate the desired output format.

New cards

Few-Shot Learning Prompt

Prompting method that supplies several examples (typically 2–5) to guide the model’s response.

New cards

Chain-of-Thought Prompting

Technique that asks the model to reveal intermediate reasoning steps to improve complex task performance.

New cards

Iterative Prompting

Strategy of refining a prompt over multiple rounds based on previous outputs to converge on better results.

New cards

Negative Prompting

Prompting approach that explicitly states what content or style should be avoided in the response.

New cards

Hybrid Prompting

Combining multiple prompting techniques (e.g., few-shot + chain-of-thought) to optimize outputs.

New cards

Prompt Chaining

Breaking a complex task into sequential prompts where each output feeds into the next step.

New cards

Hallucination (in LLMs)

Phenomenon where a language model generates confident but factually incorrect or fabricated content.

New cards

Bias (in LLMs)

Tendency of a model’s outputs to reflect societal or data-driven stereotypes and unfair representations.

New cards

Language AI

Subfield of artificial intelligence dedicated to understanding, processing, and generating human language.

New cards

Foundation Model

Large, versatile model pre-trained on broad data that can be fine-tuned for many downstream tasks.

New cards

Residual Connection

Shortcut that adds layer input to its output, stabilizing deep Transformer training.

New cards

Layer Normalization

Normalization technique applied across features in each layer to improve training stability in Transformers.

New cards

GeLU Activation

Gaussian Error Linear Unit—non-linear function commonly used in Transformer feed-forward networks.