1/81
80 vocabulary flashcards covering key Domain 2 Generative AI concepts, AWS services, model architectures, security, RAG, evaluation metrics, and CAF-AI perspectives.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Generative AI
A subset of deep learning in which models create new, original content (text, images, audio, code) by learning patterns from large datasets.
Foundation Model (FM)
An extremely large, pre-trained neural network with billions of parameters that acts as a base for many downstream tasks.
Parameters
The internal variables a model learns during training; more parameters generally mean greater capacity and capability.
Prompt
User-supplied input (instructions, context, questions, examples) that tells a generative model what to do.
Completion
The output text (or image, etc.) that a generative AI model returns in response to a prompt.
Inference
The run-time process where a trained model uses its knowledge to generate a completion from a prompt.
Prompt Engineering
The skill of designing, structuring, and refining prompts to obtain the desired model output.
In-Context Learning
Technique of providing task examples inside the prompt so the model can mimic them without retraining.
Zero-Shot Learning
Asking the model to perform a task with no examples in the prompt.
One-Shot Learning
Supplying exactly one example of the task inside the prompt.
Few-Shot Learning
Including multiple examples of the task inside the prompt to guide the model.
Transformer Architecture
State-of-the-art neural network design (introduced in “Attention Is All You Need,” 2017) that processes sequences in parallel using self-attention.
Tokenizer
Component that splits human text into tokens and converts them to numeric IDs the model can process.
Token
Basic data unit for an LLM (roughly a word or sub-word) used to measure context window size and pricing.
Vector
Ordered list of numbers representing features of a concept; enables mathematical comparison of similarity.
Embedding
Dense vector representation of a token or item that captures its semantic meaning and context.
Transformer Network
Neural network built from stacked encoder/decoder blocks using self-attention and positional embeddings.
Self-Attention Mechanism
Process that lets a Transformer weigh the importance of every token relative to every other token when generating output.
Positional Embeddings
Extra vectors added to token embeddings to convey each token’s position in the sequence so order is preserved.
Context Window
Maximum number of tokens (prompt + completion) an LLM can handle in a single request.
Encoder (Transformer)
Half of a Transformer that reads the entire input and produces a contextual representation of it.
Decoder (Transformer)
Half of a Transformer that takes encoder context (or previous outputs) and generates output tokens one by one.
Softmax Output Layer
Final function that converts raw model scores into a probability distribution over possible next tokens.
Pre-Training
Compute-intensive initial training phase where the model learns statistical patterns from large, unlabeled data.
Self-Supervised Learning
Training method in which the model generates its own labels (e.g., predicting the next word) instead of using human-labeled data.
Unimodal Model
Generative model that accepts and outputs only one data type (e.g., text-to-text).
Multimodal Model
Model capable of processing and/or generating multiple data types, such as text, images, or audio.
Diffusion Model
Generative model that creates content by reversing a stepwise noising process, refining random noise into coherent output.
Stable Diffusion
Efficient diffusion architecture that performs denoising in a low-dimensional latent space to generate images from text.
Forward Diffusion
Conceptual training process of adding progressive noise to data so the model learns the noise pattern.
Reverse Diffusion
Generative process of starting with noise and iteratively removing it to create new content.
Latent Space
Compressed, abstract feature space where models operate to represent data more efficiently than raw pixels or text.
Retrieval-Augmented Generation (RAG)
Technique that enriches a prompt with retrieved, authoritative data before generation to reduce hallucinations and add freshness.
Knowledge Bases for Amazon Bedrock
Fully managed AWS feature that automates RAG: ingesting data, creating embeddings, storing them, and retrieving context for prompts.
Vector Database
Specialized store that indexes embeddings and returns semantically similar vectors for a query vector.
Ingestion (Knowledge Base)
Process of chunking source documents, generating embeddings, and loading them into a vector database.
Amazon OpenSearch Serverless
Fully managed, pay-per-use AWS vector database option ideal for quick, low-overhead RAG setups.
Pinecone
Purpose-built, high-performance vector database suited for large-scale, low-latency semantic search workloads.
Redis Enterprise Cloud
In-memory database choice for real-time, ultra-low-latency vector search, often used when Redis is already in use.
Amazon Aurora (pgvector)
Relational database (PostgreSQL) with vector search extension, ideal when structured data already resides in Aurora.
MongoDB Atlas
Document database offering vector search; chosen when data is stored in MongoDB JSON documents.
Amazon S3 (RAG Data Source)
Primary storage location where Bedrock Knowledge Bases ingest supported text-centric documents.
Hallucination (LLM)
Model output that is plausible-sounding but factually incorrect or fabricated.
Prompt Injection
Attack where a malicious input causes the model to ignore original instructions and perform unintended actions.
Data Poisoning
Attack that corrupts training data to bias or compromise a model’s behavior.
Model Inversion
Attack attempting to reconstruct private training data by repeatedly querying a model.
ROUGE
Metric that evaluates automatic text summarization quality by comparing model output to reference summaries.
BLEU
Metric that measures machine-translation quality by comparing model output to human reference translations.
Generative Adversarial Network (GAN)
Model consisting of competing generator and discriminator networks that produce high-fidelity synthetic data, especially images.
Variational Autoencoder (VAE)
Encoder–decoder model that learns a latent space to generate new data and allows controlled attribute manipulation.
Reinforcement Learning from Human Feedback (RLHF)
Fine-tuning approach where human-ranked outputs create a reward model to align an LLM with human preferences.
Amazon Bedrock
AWS fully managed service giving API access to multiple foundation models with usage-based pricing.
Amazon SageMaker JumpStart
SageMaker hub offering pre-trained models, notebooks, and 1-click deployments to accelerate ML and generative AI projects.
Amazon Titan
AWS family of foundation models (text and embeddings) available exclusively through Bedrock.
Amazon Q Developer (CodeWhisperer)
Generative AI coding assistant that produces code suggestions directly in an IDE from natural-language comments.
PartyRock
Playground built on Bedrock that lets users experiment with prompt engineering by rapidly creating small AI apps.
AWS Nitro System
Hardware foundation of modern EC2 instances providing isolated, hardware-enforced security for customer workloads.
AWS Trainium
AWS-designed chip optimized for high-performance, cost-efficient training of large ML models.
AWS Inferentia
AWS-designed chip optimized for high-throughput, low-cost inference of ML models.
Transfer Learning
Method of starting with a pre-trained model and fine-tuning it on a smaller, domain-specific dataset.
CAF-AI (Cloud Adoption Framework for AI)
AWS strategic framework guiding organizations across six perspectives to scale AI responsibly and effectively.
CAF-AI Business Perspective
Focuses on aligning AI initiatives with measurable business outcomes and ROI.
CAF-AI People Perspective
Addresses workforce skills, culture, and change management for AI adoption.
CAF-AI Governance Perspective
Ensures responsible, ethical, and compliant AI through policies and risk management.
CAF-AI Platform Perspective
Covers technology architecture, MLOps pipelines, and scalable infrastructure for AI workloads.
CAF-AI Security Perspective
Protects data, models, and intellectual property against threats unique to AI systems.
CAF-AI Operations Perspective
Defines processes for running, monitoring, and continuously improving AI systems in production.
High Availability (HA)
Design goal of minimizing downtime so a system stays accessible (e.g., 99.99 % uptime).
Fault Tolerance (FT)
Capability of a system to keep operating without interruption even when components fail.
AWS Region
Geographically isolated AWS area containing multiple Availability Zones; key to disaster-recovery strategies.
Availability Zone (AZ)
Physically separate data-center cluster within a Region; applications spanning multiple AZs gain HA and FT.
Edge Location
AWS Point of Presence used by CloudFront and Global Accelerator to cache or route traffic closer to users.
Vector Embeddings Model
Model (e.g., Amazon Titan Text Embeddings) that converts text into high-dimensional numeric vectors for similarity search.
Token-Based Pricing
Pay-per-use cost model where charges depend on the number of tokens processed (input + output).
Self-Hosting (LLM)
Running a model on your own EC2/GPU infrastructure, incurring 24/7 compute costs and operational overhead.
Temperature (LLM Parameter)
Inference setting controlling randomness of output; higher values yield more creative but less deterministic text.
Top-p (Nucleus) Sampling
Decoding method where the model samples from the smallest set of top probable tokens whose cumulative probability exceeds p.
Embedding Layer
First neural-network layer that maps discrete token IDs to learned dense vectors.
Statelessness (LLM)
Property that the model does not retain conversational memory between separate calls unless explicitly provided.
Grounding (RAG)
Supplying external, authoritative data to an LLM so it can generate fact-based, context-relevant answers.
Chunking (RAG)
Splitting large documents into smaller text pieces before embedding and storing them for retrieval.
Soft Prompt Tuning
Lightweight fine-tuning approach that learns a small set of prompt tokens instead of updating the entire model.