1/38
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No analytics yet
Send a link to your students to track their progress
Modality (FM Selection Criterion)
The type(s) of data a foundation model can process or generate â for example, text only, images only, or multiple types simultaneously (multimodal). A primary selection criterion because it determines whether a model is architecturally capable of handling the application's input and output requirements.
Latency (FM Selection Criterion)
The time delay between sending a request to a foundation model and receiving its response. A critical selection criterion for real-time or interactive applications; higher-latency models may be acceptable for batch or asynchronous workloads.
Model Complexity (FM Selection Criterion)
A characteristic of a foundation model reflecting the number of parameters, depth of architecture, and computational demands. More complex models generally achieve higher accuracy on difficult tasks but require more compute, increase latency, and cost more to run.
Inference Parameters
Configuration settings that control the behavior of a foundation model at inference time (i.e., when generating a response). Key parameters include Temperature, Top-p, Top-k, Maximum Length, Stop Sequences, and Response Length. These parameters are adjusted to balance diversity, coherence, and output length.
Temperature (Inference Parameter)
A randomness and diversity parameter (range 0â1) that controls how deterministic or creative a model's output is. Low values (e.g., 0.2) produce more focused, consistent responses; high values (e.g., 1.0) produce more diverse, creative responses. Modifies the probability distribution over the model's token choices.
Top-p (Nucleus Sampling)
An inference parameter (range 0â1) that limits the model's token choices to the smallest set of tokens whose cumulative probability meets the threshold p. Low values (e.g., 0.25) restrict word choices; high values (e.g., 0.99) allow a wide range of word choices, increasing diversity.
Amazon Bedrock Knowledge Bases
An Amazon Bedrock feature that implements a fully managed RAG workflow. It connects to structured data sources via SQL and ingests unstructured data from sources such as Amazon S3, Confluence, Microsoft SharePoint, Salesforce, and Web Crawler. It automatically creates embeddings and stores them in a supported vector search database (Aurora, OpenSearch Serverless, Neptune Analytics, MongoDB, Pinecone, Redis Enterprise Cloud).
Vector Database (Vector Store)
A specialized database designed to store, index, and query high-dimensional vector embeddings efficiently. Used in RAG pipelines and semantic search applications. AWS vector-capable services include Amazon OpenSearch Service, Amazon OpenSearch Serverless, Amazon Aurora PostgreSQL Compatible Edition, Amazon RDS for PostgreSQL, Amazon Neptune ML, Vector search for Amazon MemoryDB, and Amazon DocumentDB.
Amazon OpenSearch Service
A fully managed AWS search and analytics service based on OpenSearch. Supports vector search via built-in k-nearest neighbor (k-NN) and semantic search capabilities, making it suitable as a vector store for RAG implementations and ML-augmented search experiences.
Amazon OpenSearch Serverless
A deployment option for Amazon OpenSearch Service. Provides the same vector search and indexing capabilities as Amazon OpenSearch Service without requiring infrastructure management â suited for variable or unpredictable workloads.
Amazon Aurora PostgreSQL Compatible Edition
A fully managed, high-performance relational database service compatible with PostgreSQL. Supports vector search functionality through the pgvector extension, enabling vector-similarity search combined with traditional relational data operations in the same database.
Amazon RDS for PostgreSQL
Amazon's managed relational database service running the PostgreSQL engine. Supports the pgvector extension, which provides vector-similarity search capabilities, making it a suitable vector store for RAG implementations.
Amazon Bedrock Agents
An Amazon Bedrock feature that enables foundation models to perform multi-step, autonomous tasks by orchestrating a sequence of actions â such as calling APIs, querying knowledge bases, and executing code â to complete complex business workflows. Can understand the user's goal, break it into steps, and execute them with minimal human intervention.
FM Customization â Tradeoffs (4)
The cost-capability tradeoffs between the four main approaches to customizing a foundation model's behavior, in increasing order of cost and control: (1) Prompt Engineering â fastest, cheapest, no retraining; (2) RAG â adds external knowledge at inference time, no weight changes; (3) Fine-tuning â modifies model weights on custom data, higher cost; (4) Pre-training â trains from scratch or continues pre-training, highest cost and control.
Single-Shot Prompting
A prompting technique where exactly one input-output example is included in the prompt to demonstrate the desired pattern before asking the model to perform the task. Sits between zero-shot (no examples) and few-shot (multiple examples) in terms of guidance provided.
Negative Prompting
A prompt engineering technique that explicitly instructs the model about what NOT to include, generate, or do â constraining the output by specifying undesired content, formats, or behaviors. Helps prevent hallucinations, off-topic content, and undesired styles.
Guardrails (Prompt Engineering)
Safeguards or constraints built into prompts or the model deployment layer to prevent the generation of undesirable, harmful, or biased content. On AWS, Amazon Bedrock supports configurable guardrails as a managed capability.
Adversarial Prompting
A category of security risks associated with prompt engineering where malicious actors craft inputs designed to manipulate, deceive, or exploit a foundation model. Includes exposure, prompt injection, jailbreaking, hijacking, and poisoning attacks.
Exposure (Adversarial Prompting Risk)
An adversarial prompt engineering risk where a crafted prompt causes the model to reveal sensitive information, trade secrets, system instructions, or confidential data it should not disclose â resulting in data breaches or intellectual property theft.
Prompt Injection
An adversarial attack where a malicious actor embeds harmful or manipulative instructions within user-provided input, causing the model to execute unintended commands or generate harmful content by overriding the original prompt's intent.
Jailbreaking (Prompt Attack)
An adversarial technique that exploits vulnerabilities in a model's safety constraints to bypass its ethical safeguards or content policies, causing it to produce content it was designed to refuse.
Prompt Hijacking
An adversarial prompt technique where a malicious actor crafts inputs that steer the model's responses in a desired (often harmful or off-brand) direction, effectively taking control of the model's output for malicious purposes.
Poisoning (Prompt / Training Attack)
An attack that introduces corrupted, biased, or adversarial data into the model's training dataset, manipulating the model's learned behavior and outputs â leading to systematically biased or unreliable results in production.
Amazon Nova
A family of AWS foundation models available through Amazon Bedrock that provide pre-trained language model capabilities with customization and control through prompt engineering. Used alongside Amazon Bedrock for building generative AI solutions.
Continuous Pre-training
An ongoing FM training approach that further pre-trains an already pre-trained model on additional diverse data to continually expand its knowledge base and adaptability â while retaining existing knowledge. Produces progressively more knowledgeable and versatile models over time. Mitigates the risk of catastrophic forgetting associated with fine-tuning.
Catastrophic Forgetting
A phenomenon that can occur during fine-tuning where the model loses (forgets) knowledge it acquired during pre-training as its weights are adjusted to optimize for the new, narrower task. Mitigated by continuous pre-training and careful fine-tuning data design.
Instruction Fine-tuning
A fine-tuning method that trains a pre-trained FM using examples of instructions paired with the desired model responses â teaching the model how to follow a specific type of instruction. Prompt tuning is a variant of instruction fine-tuning.
Reinforcement Learning from Human Feedback (RLHF)
A fine-tuning method that incorporates human feedback data to align a foundation model's behavior with human preferences â making outputs more helpful, accurate, honest, and safe. A human evaluator rates or ranks model outputs, and the model is trained to maximize the behaviors humans prefer.
Holdout Validation (Fine-tuning)
A model evaluation approach used during fine-tuning where a separate validation dataset (not used during training) is used to assess model performance on unseen data during or after the fine-tuning process â informing decisions about further training or deployment.
Amazon SageMaker Canvas
A SageMaker feature that provides a visual, low-code/no-code interface for creating ML data preprocessing flows and building ML models. Enables feature engineering workflows and model development with minimal coding, accessible to non-developers.
Amazon SageMaker Clarify
A SageMaker tool that analyzes training data and ML model outputs to detect and measure potential bias across multiple dimensions (such as gender, race, or age). Also provides model explainability capabilities to help developers understand and address fairness and transparency issues in their ML models.
Amazon SageMaker Ground Truth
A SageMaker data labeling service that manages human-in-the-loop data labeling workflows for training datasets. Provides a comprehensive set of capabilities to help users build and manage labeled datasets at scale, ensuring high-quality labeled data for model training across the ML lifecycle.
Amazon EMR (Elastic MapReduce)
An AWS managed big data processing service that runs open-source frameworks such as Apache Spark, Apache Hive, and Presto at scale. Amazon SageMaker Studio provides built-in integration with Amazon EMR for scalable data preparation tasks.
Human Evaluation (FM)
A foundation model evaluation method where human raters assess the quality, coherence, accuracy, and usefulness of the model's outputs â providing a gold-standard signal that automated metrics may miss, especially for open-ended or creative tasks.
Probing Tasks (FM Evaluation)
Diagnostic evaluation tasks designed to systematically analyze a foundation model's capabilities and limitations in specific areas â for example, arithmetic reasoning, factual recall, or logical inference â by testing targeted sub-skills.
Robustness Testing (FM Evaluation)
An FM evaluation approach that assesses the model's ability to handle edge cases, adversarial inputs, or distribution shifts without significant performance degradation. Tests whether the model generalizes reliably beyond its training distribution.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation)
A set of evaluation metrics used to assess the quality of automatically generated summaries and machine translations by comparing them to one or more reference texts. Measures overlap (recall-oriented) between generated and reference content.
BLEU (Bilingual Evaluation Understudy)
A metric used to evaluate the quality of machine-generated text â particularly machine translation â by measuring similarity between the generated text and one or more reference translations. Considers both precision and brevity (brevity penalty).
BERTScore
A semantic similarity metric for evaluating generated text that uses pre-trained bidirectional encoder representations from transformers (BERT) models to compute contextualized embeddings for both the generated and reference texts, then calculates cosine similarity between them. More robust to paraphrasing than n-gram metrics like BLEU and ROUGE.