SA

AWS AI & Machine Learning – Vocabulary Flashcards

AWS Cloud Adoption Framework for AI (CAF-AI)

The Framework – Purpose & Scope

• Provides a structured roadmap and strategic guidance to organisations looking to accelerate their adoption of Artificial Intelligence (AI) and Machine Learning (ML) technologies, translating these advancements into tangible, measurable business value such as increased revenue, reduced costs, or improved customer experience.
• Outlines best practices, frameworks, and identifies the essential organisational capabilities across various domains (e.g., people, process, technology) required for successful and sustainable AI/ML implementation.

AI Transformation Journey – 4 Iterative Stages

Envision
– Work backward from desired business outcomes, starting with the business problem rather than the technology. This involves identifying ambitious, yet achievable, AI-driven goals.
– Identify and prioritise specific AI opportunities and use cases (e.g., predictive maintenance, personalized marketing) that directly align with overarching strategic business goals and create a compelling vision for AI impact.
Align
– Secure robust stakeholder buy-in from all levels, including executive leadership, technical teams, and business units. This stage involves mapping inter-departmental dependencies and proactively mitigating potential organisational, technical, or cultural concerns.
– Develop a comprehensive organisational-readiness strategy, addressing critical factors like upskilling workforces, fostering an "AI-first" culture, allocating appropriate funding, and defining the organisation's risk appetite for AI initiatives.
Launch
– Focuses on the hands-on delivery and execution of initial pilots or Proof-of-Concepts (POCs), quickly bringing them into a production environment. This involves developing, testing, and deploying the first AI/ML models.
– The primary goal is to prove real, measurable value quickly and demonstrate the feasibility and benefits of AI/ML solutions with minimal initial investment.
• **Scale
** – Expand successful pilots and POCs across different business units, geographies, or customer segments to maximise their impact.
– Address all underlying technology, process, and cultural scaling factors, ensuring the AI/ML solutions can operate reliably and efficiently at enterprise scale.

Six Perspectives & Example Capabilities

Business – Focuses on defining business value: outcome alignment with strategic goals, robust AI/ML portfolio management to track initiatives, and establishing innovation programs to continuously explore new AI applications.
People – Centers on human capital: building enterprise-wide AI fluency through training, strategic talent acquisition to bring in AI/ML expertise, and fostering an "AI-first" culture that encourages experimentation and data-driven decision-making.
Governance – Ensures responsible and compliant AI adoption: strong program management for AI initiatives, comprehensive risk management to identify and mitigate AI-specific risks, and crucial Responsible AI practices encompassing fairness (reducing bias), explainability (understanding model decisions), and accountability (establishing clear responsibility for AI outcomes).
Platform – Relates to the technological foundation: establishing a scalable, secure, and cloud-native AI stack encompassing robust Data architecture (e.g., data lakes, data warehouses), MLOps (Machine Learning Operations) practices for automation, and leveraging services like Amazon SageMaker for end-to-end ML lifecycle management.
Security – Addresses protection of AI assets: maintaining confidentiality, integrity, and availability of data and models, identifying and mitigating new AI attack vectors (e.g., adversarial attacks), and utilising fine-grained access controls through AWS IAM.
Operations – Ensures reliability and performance: implementing reliable ML services, establishing comprehensive monitoring (e.g., using Amazon CloudWatch for metrics and logs), and defining clear incident response procedures for operational issues or model degradation.

Core AI & Machine-Learning Concepts

What is AI?

• AI is a broad and interdisciplinary field of computer science dedicated to building intelligent systems that can sense their environment (e.g., via cameras, microphones), reason about the information they perceive, act autonomously or assist humans to achieve goals, and adapt their behavior over time through learning, thereby performing tasks traditionally requiring human intelligence.

Key Definitions

Machine Learning (ML) – A subfield of AI where algorithms learn patterns and make predictions from data without being explicitly programmed for specific tasks. This learning enables systems to improve their performance over time. (Core AWS service for ML → Amazon SageMaker.)
Deep Learning (DL) – A specialised subset of ML that uses multi-layer artificial neural networks with many hidden layers (hence "deep") to learn hierarchical representations from large amounts of data, excelling in complex perception tasks like image recognition, speech processing, and natural language understanding.
Generative AI – A type of AI model capable of creating new, original content such as text, images, audio, or video, often in response to a given prompt. These models learn patterns and structures from existing data to generate novel outputs (e.g., Amazon Bedrock provides access to various Foundation Models (FMs)).

AI vs ML

• The relationship is hierarchical: "All ML is AI, but not all AI is ML." AI encompasses a wider range of techniques to simulate human intelligence, while ML is a specific approach within AI focused on learning from data.
AI Goal: The broader objective is to enable machines to simulate human thinking, problem-solving, and general intelligence across various tasks.
ML Goal: The specific objective is to enable systems to learn patterns from data, make accurate predictions or decisions, and improve performance on a specific task without being explicitly programmed for every scenario.

Training vs Inference

Training – Involves feeding a large, diverse, and high-quality dataset into an ML model. During this phase, the model iteratively adjusts its internal parameters (weights and biases) based on discrepancies between its predictions and the actual labels in the data, typically through optimisation algorithms like gradient descent. The goal is for the model to learn underlying patterns and relationships.
Inference – Refers to the process of using a pre-trained ML model to make predictions or generate outputs on new, unseen data. Once a model is trained and its weights are optimised, it can be deployed to apply its learned knowledge to real-world inputs.

Common AI Use-Cases

Innovation acceleration (e.g., speeding up drug discovery by predicting molecular interactions, or materials science research).
Customer experience (e.g., powering intelligent chatbots for instant support, enabling hyper-personalisation of product recommendations or content delivery based on user preferences).
Better decisions (e.g., enhancing financial accuracy through sophisticated fraud detection algorithms, optimising complex logistics and supply chain management for efficiency).
Process automation (e.g., automating repetitive data entry tasks from documents using services like Amazon Textract for optical character recognition and intelligent document processing).

Four Types of ML

Supervised Learning – Involves training models on a labeled dataset, meaning each data point includes input features and a corresponding correct output (label). The model learns to map inputs to outputs.
Regression → A type of supervised learning used to predict a continuous numerical value (e.g., predicting house prices based on features like size and location, forecasting stock prices, or predicting temperature).
Classification → A type of supervised learning used to predict a discrete category or class (e.g., classifying emails as spam or not spam, identifying images as cat or dog, or diagnosing diseases).
Unsupervised Learning – Deals with unlabeled data and focuses on discovering hidden patterns, structures, or relationships within the data without explicit guidance. The goal is to gain insights or reduce dimensionality.
Clustering (grouping similar data points together, e.g., customer segmentation).
Association (finding relationships between variables, e.g., market basket analysis).
Reinforcement Learning (RL) – An agent learns to make optimal decisions by interacting with an dynamic environment, receiving rewards for desirable actions and penalties for undesirable ones. The agent's goal is to maximise the cumulative reward over time, often expressed as the discounted sum of future rewards: Rt = \sum{k=0}^{\infty} \gamma^k r{t+k+1}, where Rt is the total discounted reward from time t, \gamma is the discount factor (0 to 1), and r_{t+k+1} is the reward at future time step t+k+1 (e.g., training autonomous systems like robotic control or game playing, or AWS DeepRacer for autonomous driving).
Self-Supervised Learning – A hybrid approach where models generate their own labels from the input data itself, allowing them to learn powerful representations without requiring explicit human-annotated labels. This approach is key for training large modern foundation models (FMs) by tasks like predicting missing words in a sentence or generating the next token.

Model Learning & Performance

Data Splits – Essential for evaluating model generalisation: data is typically divided into Training (for model learning), Validation (for hyperparameter tuning and model selection), and Test sets (for final, unbiased evaluation of performance on unseen data).
Bias vs Variance
Bias → Occurs when a model is too simple or consistently makes systematic errors, leading to underfitting the training data and failing to capture the underlying patterns. High bias results in high error rates on both training and test data.
Variance → Occurs when a model is too complex and learns the noise in the training data rather than the true patterns, leading to overfitting. High variance models perform well on training data but poorly on unseen test data.
Hyperparameters – Configuration values that are set before the training process begins and remain constant during training. They control the learning process itself (e.g., learning-rate, number of epochs, network depth, batch size, regularisation strength).

Model Evaluation

RLHF – Reinforcement Learning from Human Feedback is a technique used to align large language models (LLMs) with human preferences, values, and instructions by training a reward model on human-ranked outputs and then optimising the LLM using reinforcement learning based on this reward signal.
Classification metrics → Used for models that predict categories:
Accuracy: The proportion of total predictions that were correct ((TP+TN)/(TP+TN+FP+FN)).
Precision: The proportion of positive identifications that were actually correct (TP/(TP+FP)).
Recall: The proportion of actual positives that were correctly identified (TP/(TP+FN)).
F1-Score: The harmonic mean of Precision and Recall, providing a balance between the two (2 * (Precision * Recall) / (Precision + Recall)), useful when classes are imbalanced.
Regression metric → Used for models predicting continuous values:
RMSE (Root Mean Squared Error): A commonly used metric that measures the average magnitude of the errors between predicted and actual values. It's the square root of the average of squared differences: RMSE = \sqrt{\frac{1}{n}\sum{i=1}^{n}(yi-\hat yi)^2} where yi are observed values, \hat y_i are predicted values, and n is the number of observations. It penalises larger errors more heavily.

ML Lifecycle

Business Problem (Define the problem, objectives, and success metrics) → Data Collection & Preparation (Gather relevant data, clean, transform, and format it for analysis) → Feature Engineering (Select, create, and transform variables into features that improve model performance) → Training (Develop and train ML models using chosen algorithms and prepared data) → Evaluation (Assess model performance using appropriate metrics and tune hyperparameters) → Deployment & Monitoring (Integrate the trained model into production systems and continuously monitor its performance and drift).

When NOT to Use ML

Simple rule-based logic suffices: If a problem can be adequately solved with a set of deterministic if-then-else rules, ML might introduce unnecessary complexity and overhead.
Insufficient high-quality data: ML models heavily rely on large volumes of clean, relevant, and representative data for effective training. Without it, model performance will be poor.
Need 100% explainability per decision: For domains requiring absolute transparency and audibility for every decision (e.g., certain legal or medical contexts), the inherent "black box" nature of some complex ML models can be problematic.
No learnable pattern exists: If the relationship between inputs and outputs is purely random or chaotic, an ML model will not be able to find any meaningful patterns to learn from.

Data Types for AI

Structured Data – Organised in a tabular format with a rigid schema, rows, and columns, making it easily searchable and analyzable (e.g., relational databases like Amazon RDS, CSV files, excel spreadsheets). It's ideal for traditional numerical analysis and standard ML algorithms.
Unstructured Data – Lacks a pre-defined schema, making it free-form and less organised. It constitutes >80\% of enterprise data and includes text documents, images, audio, video files, and emails (often stored efficiently in object storage like Amazon S3 and analysed by services like Amazon Rekognition for images/videos or Amazon Comprehend for text).
Semi-Structured Data – Contains tags or markers to organise data elements but does not conform to a fixed schema (e.g., JSON and XML documents, NoSQL databases like Amazon DynamoDB or DocumentDB). It offers more flexibility than structured data but retains some inherent organisation.
Time-Series Data – A sequence of data points indexed in time order, making it ideal for tracking changes over periods. Each data point corresponds to a specific timestamp (e.g., sensor data, stock prices, IoT device metrics, stored in specialised databases like Amazon Timestream).

Model Inferencing Paradigms

Real-Time / Online Inference – Involves making predictions with very low latency (milliseconds) on single or small batches of records as soon as new data arrives. It's crucial for applications requiring immediate responses (e.g., real-time fraud detection, personalised website recommendations, dynamic pricing). This is typically handled by SageMaker Real-Time Endpoints which maintain a running model ready for requests.
Batch / Offline Inference – Involves making predictions on large datasets where high latency is acceptable, often processed in a scheduled or asynchronous manner (e.g., nightly forecasts, processing customer churn predictions for an entire database, image processing for a vast archive). This is efficiently handled by SageMaker Batch Transform, which does not require a persistent endpoint.

Generative AI & Amazon Bedrock

Generative AI Basics

• Generative AI models are designed to produce novel outputs (e.g., text, images) in response to input prompts, generating "completions" that are creative and coherent. They learn to understand the underlying distributions and patterns of their training data to create new, similar, but non-identical data.
Prompt Engineering – The art and science of carefully crafting inputs (prompts) to guide a generative AI model to produce optimal, desired, and contextually relevant responses. It involves experimenting with phrasing, examples, and instructions to elicit the best possible output.

Amazon Bedrock – Fundamentals

• Amazon Bedrock is a fully managed service that provides easy access to high-performing foundation models (FMs) from Amazon and leading AI startups through a single API. Its core emphasis is on Choice & Simplicity, offering flexibility in model selection and streamlining model deployment and management.
• It acts as a serverless service, meaning users don't need to provision or manage any infrastructure, significantly simplifying the development and deployment of generative AI applications.

Available FMs

Amazon Titan Models – A family of FMs developed by AWS, including Titan Text (for text generation, summarisation, Q&A) and Titan Embeddings (for converting text into numerical representations for search, recommendations, and personalisation).
Anthropic Claude – Known for its Constitutional AI approach, focusing on responsible AI development by training models to follow a set of principles rather than human review, ideal for safe and helpful conversations.
AI21 Labs Jurassic – A powerful family of large language models designed for complex language tasks, offering strong performance in areas like writing, summarisation, and question answering.
Cohere Command / Embed – Command is a powerful LLM optimised for business use cases, while Embed provides text embeddings for deep semantic search and retrieval applications.
Meta Llama – An open-source collection of FMs that allows developers and researchers to build upon and fine-tune models for custom applications, promoting innovation in the AI community.
Stability AI Stable Diffusion – A prominent text-to-image model capable of generating high-quality images from natural language descriptions, widely used for creative content generation.

Customizing FMs – Fine-Tuning vs RAG

Fine-Tuning
– Involves continuing the training of a pre-trained foundation model with a smaller, labeled dataset specific to a particular domain or task. This process modifies the model's internal weights and biases.
– It typically requires substantial data, higher computational cost, more time, and greater complexity compared to RAG. It's great for adapting a model's style, tone, or specific factual knowledge deeply within its parameters.
Retrieval Augmented Generation (RAG)
– Augments the prompt sent to the LLM with relevant external information retrieved from a real-time knowledge-base (e.g., proprietary documents, databases), without changing the underlying model weights.
– It's generally faster, cheaper, and more effective at reducing "hallucinations" (fabricating information) by grounding the model's responses in specific, verifiable data. This is the default and recommended approach when using Bedrock Knowledge Bases, allowing FMs to leverage current and private data sources.

Model Evaluation in Bedrock

• Bedrock supports both automated metrics (e.g., accuracy, toxicity scores, fluency, relevance) and human review workflows. This allows for comprehensive assessment of model performance, quality, and alignment with safety guidelines, ensuring generated content meets desired standards.

Responsible AI – Bedrock Guardrails

• Bedrock Guardrails allow users to configure safety measures for their generative AI applications. This includes setting up Denied Topics to prevent discussion of specific subjects, configuring Content Filters to detect and filter out categories of harmful content (e.g., hate speech, insults, violence, sexual content), implementing Word Filters for specific prohibited terms, and enabling PII Redaction to automatically remove sensitive Personally Identifiable Information from outputs.

Other GenAI Concepts

Hallucination – A common issue in generative AI, referring to instances where the model produces fabricated, nonsensical, or factually incorrect output that is presented as truthful.
Embeddings – Dense numerical vectors that represent text, images, or other data in a multi-dimensional space. They are crucial because they capture the semantic meaning and relationships between data points, allowing for efficient similarity searches and clustering.
Vector Database – A specialised database designed to store, manage, and efficiently search embeddings (vectors). They are essential for RAG architectures, enabling quick retrieval of semantically similar information from vast knowledge bases (e.g., Amazon Kendra offers intelligent search with vector capabilities, OpenSearch now has a built-in vector engine).

Bedrock Features & Prompt Engineering

Agents for Amazon Bedrock

• A managed capability within Amazon Bedrock that enables foundation models to autonomously execute complex, multi-step tasks by orchestrating interactions with company systems. They can call external APIs, invoke AWS Lambda functions, and retrieve information from knowledge bases.
• Goes beyond simple Q&A by empowering FMs to "get things done" in the real world, such as booking flights, processing orders, or completing forms.

Monitoring & Pricing

CloudWatch Metrics – Provides key operational metrics for Bedrock usage, including Invocations (number of times the model was called), InvocationLatency (time taken for a response), and InvocationErrors (number of failed requests). These metrics are crucial for performance tracking and troubleshooting.
Pricing (token-based) – Bedrock pricing is primarily based on the number of input and output tokens processed by the models (1 token is approximately 4 characters for English text).
On-Demand: A flexible pay-as-you-go pricing model, where you pay only for the tokens you consume, suitable for variable workloads and initial experimentation.
Provisioned Throughput: Allows commitment to a specific level of model throughput for a discounted rate over a committed period. Ideal for predictable, high-volume workloads, ensuring dedicated capacity and lower costs per token.

Prompt Engineering Techniques

Zero-Shot Prompting – Providing a prompt to the model without any examples. The model relies solely on its pre-trained knowledge to generate a response.
Few-Shot Prompting – Including a small number of exemplar input-output pairs within the prompt to guide the model's pattern recognition, leading to significantly better quality and more aligned responses than zero-shot.
Chain-of-Thought (CoT) Prompting – Encouraging the model to generate intermediate reasoning steps before reaching a final answer by adding phrases like “Let’s think step by step” or showing explicit reasoning in few-shot examples. This improves complex reasoning and reduces errors.
Prompt Templates – Pre-defined, reusable patterns or structures for prompts with placeholders that can be filled in with different inputs. They ensure consistency, reduce manual effort, and improve the reliability of model responses across various applications.

Amazon Q & PartyRock

Amazon Q – Enterprise GenAI Assistant

• Amazon Q is a generative AI-powered assistant specifically designed for enterprise use, with a strong emphasis on security-first principles, ensuring that data is protected and privacy is maintained. It securely connects to an organisation's various data sources (e.g., Amazon S3, Microsoft SharePoint, Salesforce, Confluence) to provide comprehensive and contextually relevant answers.
Q Business – Tailored for business users, providing a managed RAG solution for secure, accurate answers from internal company data. It includes Q Apps, which are no-code generative AI application builders that allow non-developers to create AI tools.
Q Developer – Designed for engineers and developers, integrating directly into Integrated Development Environments (IDEs) like VS Code, AWS Management Console, and other developer tools. It assists with code generation, debugging, troubleshooting, and provides expert AWS technical documentation and best practices.

PartyRock

• PartyRock is a free, no-code generative AI playground powered by Amazon Bedrock, providing an interactive environment for users to experiment with various foundation models and prompt engineering techniques. It allows users to quickly build simple generative AI applications without writing any code.
• It is specifically intended for learning, experimentation, and rapidly prototyping ideas – it is not designed or suitable for production-grade applications.

AWS Managed AI Services (Pre-Trained API)

These are fully managed, pre-trained AI services that offer ready-to-use API endpoints, allowing developers to easily integrate AI capabilities into applications without needing deep ML expertise or data to train models.

Amazon Comprehend – A Natural Language Processing (NLP) service that uncovers insights and relationships in text. It can identify Entities (people, places, organisations), Sentiment (positive, negative, mixed, neutral), Key Phrases, and PII (Personally Identifiable Information).
Amazon Translate – Provides fast, high-quality, and affordable neural machine translation (NMT) for text, supporting multiple languages.
Amazon Transcribe – An Automatic Speech Recognition (ASR) service that converts speech to text, supporting multiple languages and distinguishing between multiple speakers in an audio file.
Amazon Polly – A text-to-speech service that turns text into lifelike speech, offering a wide selection of voices and languages.
Amazon Rekognition – A computer vision service to identify objects, people, text, scenes, and activities in images and videos, as well as detect inappropriate content.
Amazon Lex – A service for building conversational AI interfaces (chatbots and voice bots). It powers the underlying technology for Amazon Alexa, enabling natural language understanding and speech recognition.
Amazon Personalize – A machine learning service that enables developers to build sophisticated recommendation capabilities into their applications, delivering highly personalised user experiences based on user activity and item attributes.

Specialized AI Services & Hardware

Amazon Textract – An intelligent document processing service that automatically extracts text, handwriting, and data from scanned documents beyond optical character recognition (OCR). It understands the layout of forms and tables, maintaining structural context.
Amazon Kendra – An intelligent enterprise search service powered by machine learning, designed to provide more accurate answers to complex natural language queries by searching across disparate content repositories. It excels at delivering specific answers-over-documents, not just links.
Amazon Mechanical Turk – A crowdsourcing marketplace for businesses to outsource small tasks that require human intelligence (Human Intelligence Tasks, HITs) to a distributed workforce, commonly used for data labeling, content moderation, and verification processes for AI/ML.
Amazon Augmented AI (A2I) – Makes it easy to build the human review workflows required for machine learning applications. It provides built-in human review for common use cases like content moderation and document processing, allowing humans to step in for low-confidence predictions or auditing.
Amazon Comprehend Medical / Transcribe Medical – HIPAA-eligible versions of Comprehend and Transcribe, specifically trained to extract and process health information from unstructured medical text and speech, respectively.
AWS Trainium – An AWS-designed machine learning accelerator specifically optimised for high-performance deep learning training of models, offering superior performance and cost efficiency compared to general-purpose GPUs for training complex models.
AWS Inferentia – An AWS-designed machine learning chip built specifically for high-performance and cost-effective deep learning inference, providing significant improvements in throughput and lower latency for deploying models in production.

Amazon SageMaker – End-to-End ML Platform

Amazon SageMaker is a fully managed service that provides every developer and data scientist with the ability to build, train, and deploy machine learning models quickly. It offers a comprehensive set of tools and integrated workflows.

Data Prep

SageMaker Data Wrangler – A visual and low-code service that provides a single pane of glass to prepare data for ML. It allows users to aggregate and prepare data from different sources into a clean, single dataset, apply over 300 pre-built data transformations, and generate analyses to detect data quality issues.
SageMaker Feature Store – A centralised repository to store, share, and manage curated ML features for training and inference. It ensures consistency, reusability, and low-latency access to features, preventing data leakage and improving MLOps efficiency.

Build & Train

SageMaker Studio Notebooks – Managed Jupyter notebooks that provide an integrated development environment (IDE) for ML, offering elastic compute for interactive data exploration, model prototyping, and code development. They auto-save work and can be scaled up or down as needed.
SageMaker Built-in Algorithms – Pre-optimised, scalable, and highly performant algorithms provided by SageMaker (e.g., XGBoost, K-Means, Linear Learner) that users can directly apply to their data without managing underlying infrastructure.
SageMaker Ground Truth – A fully managed data labeling service that makes it easy to build high-quality training datasets for machine learning. It can automate a significant portion of the labeling process and seamlessly integrate human reviewers to ensure accuracy and handle complex cases.

Deploy & Manage

SageMaker Real-Time Endpoints – Provides HTTPS endpoints for deploying ML models for low-latency, real-time inference. They are generally always-on and scalable to handle varying request volumes.
SageMaker Batch Transform – Used for making offline predictions on large datasets. It processes data in batches, suitable for scenarios where real-time responses are not required.

Governance / Responsible AI

SageMaker Model Cards – Provide a "nutrition label" for ML models, documenting essential information such as model purpose, training data, performance metrics, ethical considerations, and recommended uses. This enhances transparency and accountability.
SageMaker Model Dashboard – Offers a fleet-wide monitoring solution for all deployed ML models. It provides a central view to track model performance, detect data drift (changes in input data characteristics) and concept drift (changes in the relationship between input and target), and identify potential issues like bias, enabling proactive model maintenance.

Unified IDE

SageMaker Studio – A web-based, unified integrated development environment (IDE) that brings together all the tools needed for ML development. It provides a single pane of glass for building, training, debugging, deploying, and monitoring ML models, streamlining the entire ML lifecycle.

AI Challenges & Responsibilities

Responsible AI Pillars (AWS)

AWS adheres to six pillars for Responsible AI, ensuring ethical and robust AI development:

Fairness & Bias: Ensuring AI systems do not perpetuate or amplify unfair biases against certain groups, often addressing issues stemming from biased training data.
Explainability: Making AI model decisions understandable and interpretable to humans, especially for critical applications.
Robustness & Reliability: Designing AI systems that perform consistently and dependably even when faced with unexpected inputs or adversarial attacks.
Privacy & Security: Protecting sensitive data used by AI systems and safeguarding against malicious attacks like data exfiltration.
Governance: Establishing clear policies, processes, and oversight for the responsible development and deployment of AI.
Transparency: Clearly communicating how AI systems work, their capabilities, and their limitations to users and stakeholders.

Key Challenges

Bias: Often originates from unrepresentative or skewed training data, potentially leading to discriminatory or unfair outcomes when amplified by the model. Mitigation strategies involve careful data collection, pre-processing, and monitoring.
Hallucinations: Generative AI models can produce factually incorrect or nonsensical output, especially for less-represented topics or complex queries. This is typically mitigated with techniques like Retrieval Augmented Generation (RAG) by grounding responses in verified external data.
Toxicity / Misuse: AI models can generate harmful, offensive, or otherwise undesirable content, or be misused for malicious purposes (e.g., creating deepfakes, phishing). Mitigation includes content filters, ethical guardrails, and robust moderation policies.

Governance & Compliance

• Implementing clear policies and controls is crucial for responsible AI adoption, supported by tools like SageMaker Model Cards (for documentation) and Model Dashboard (for monitoring compliance and performance).
• While AWS provides secure and compliant services, the customer remains responsible for ensuring application-level compliance with industry-specific regulations (e.g., GDPR for data privacy, HIPAA for healthcare data) and their own internal policies.

Security & Privacy Threats

AI systems face specific security and privacy vulnerabilities:

Poisoning Attacks: Malicious data injected into the training set to corrupt or manipulate the model's learned behaviour, leading to flawed predictions or backdoors.
Evasion Attacks: Carefully crafted inputs (e.g., small, imperceptible changes to an image) designed to fool a trained model into making incorrect predictions during inference.
Prompt Injection: A type of attack against LLMs where malicious instructions or data are inserted into user prompts, overriding the model's original instructions or causing it to reveal sensitive information/perform unintended actions.
• AWS provides tools like Amazon Comprehend PII detection and Bedrock PII Redaction to help identify and protect sensitive information.

MLOps

• MLOps (Machine Learning Operations) is a set of practices that applies DevOps principles and practices to the entire machine learning lifecycle (MLLC). Its goal is to automate, manage, and monitor ML systems in production, focusing on reproducibility, governance, and rapid iteration.
SageMaker Pipelines orchestrates CI/CD (Continuous Integration/Continuous Delivery) workflows specifically for ML models, automating every step from data pre-processing and model training to deployment and monitoring, ensuring efficient and reliable MLOps.

Core AWS Service Integration (AI Stack)

AWS IAM (Identity and Access Management) – Provides fine-grained control over who can access your AWS resources and what actions they can perform. IAM Roles are crucial for services to assume permissions securely, allowing one AWS service (e.g., SageMaker) to interact with another (e.g., S3) without hardcoding credentials.
Amazon S3 (Simple Storage Service) – Acts as the primary data lake for AI/ML workloads due to its scalability, durability, and cost-effectiveness. It is used for storing raw training data, model artifacts, inference results, and serves as the knowledge base for Bedrock RAG (Knowledge Bases for Amazon Bedrock).
Amazon EC2 (Elastic Compute Cloud) – Provides secure and resizable compute capacity in the cloud. It serves as the underlying compute infrastructure for services like SageMaker for both model training and hosting, allowing users to choose instance types optimised for ML workloads (e.g., with GPUs).
AWS Lambda – A serverless, event-driven compute service that executes code in response to events (e.g., new data in S3, API calls). It often serves as serverless "glue" for orchestrating event-driven AI workflows, triggering ML inference, or pre-processing data without provisioning servers.

Broader Security & Governance Tooling

Amazon Macie – A machine learning-powered data security service that discovers, classifies, and protects sensitive data stored in Amazon S3. It uses ML to identify PII and other confidential information, providing visibility into data access and security posture.
AWS CloudTrail – A service that enables governance, compliance, operational auditing, and risk auditing of your AWS account. It logs all AWS API calls made by or on behalf of your account, providing a complete history of actions taken.
AWS Artifact – A central resource for compliance-related information. It provides on-demand access to AWS security and compliance reports (e.g., SOC 1, SOC 2, ISO certifications) and agreements, useful for meeting audit requirements.
AWS Trusted Advisor – An online tool that provides real-time guidance to help you provision your resources following AWS best practices. It offers recommendations across five categories: Cost Optimisation, Performance, Security, Fault Tolerance, and Service Limits, helping to identify and address potential issues.