AWS AI Practitioner: Foundations - Domain 1 - Intro to ML/AI

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/89

Earn XP

Description and Tags

Fundamentals of AI and ML

AWS Certification

Last updated 5:30 PM on 3/25/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai

No analytics yet

Send a link to your students to track their progress

90 Terms

New cards

Artificial Intelligence (AI)

The ability of machines or computer systems to mimic human-like intelligence and perform tasks that typically require human cognitive abilities. AI encompasses a broad range of technologies (computer vision and smart devices) and techniques, including machine learning and deep learning.

New cards

Machine Learning (ML)

A subset of AI that focuses on developing algorithms and statistical models that can learn from data and improve their performance over time without being explicitly programmed. ML models are trained on large datasets to identify patterns, make predictions, and optimize decision-making.

New cards

Deep Learning (DL)

An advanced form of ML inspired by the structure and function of the human brain. Deep learning models are composed of interconnected artificial neural networks and excel at tasks such as image and speech recognition, NLP, and generating creative content.

New cards

Generative AI (Gen AI)

A category of AI that builds on deep learning to create new content—text, images, audio, code, and more—based on patterns learned from training data. It sits at the top of the AI/ML/DL hierarchy.

New cards

Neural Networks

Interconnected layers of artificial nodes (modeled on the human brain) that form the foundation of deep learning models. They process data through multiple layers to recognize complex patterns.

New cards

Large Language Model (LLM)

Advanced deep learning models based on transformer architectures. Trained on massive datasets with hundreds of billions of parameters, they can understand and generate human-like text and perform tasks such as classification, translation, code generation, and question answering.

New cards

Foundation Model (FM)

A deep learning model pre-trained on large, diverse datasets. LLMs are a type of foundation model. FMs can be fine-tuned for a variety of downstream tasks.

New cards

Challenges w/ Generative AI Models (4)

Hallucinations: Results look correct
Toxicity: Includes offensive content
Bias: Perpetuate or amplify societal bias present in training data
Illegality: Content generated that resembles protected intellectual property (IP)

New cards

Classes of Generative AI Models (4)

Diffusion Models
Generative Adversarial Networks (GANs)
Variational Autoencoders (VAEs)
Transformer-based Models

New cards

Diffusion Models

Excel at producing high-quality images.

Learn to generate new data by adding noise (forward diffusion) to a clean input and then learning to remove noise (reverse diffusion) to generate output.

New cards

Generative Adversarial Networks (GANs)

Consists of two (2) neural networks trained competitively.

Generator: generate data samples to fool the discriminator.
Discriminator: Classify inputs, real vs. fake.

New cards

Variational Autoencoders (VAEs)

More efficient processing using data compression.

Extracts essential information from input data to generate new samples that resemble the original.

New cards

Transformer Architecture

The underlying model architecture used by LLMs. Transformers use attention mechanisms to process sequences of data in parallel

New cards

Algorithm

A set of rules or instructions that a machine learning model follows to learn from data and produce outputs or predictions.

New cards

Inference

The output the model produces by applying what it learned during training to new input data.

Inferencing: Using a trained ML model to make predictions or generate outputs on new unseen data. The two main types are batch (offline

New cards

Prompt

An instruction or question written in natural language text that requests the generative AI model to perform a specific task.

A well-constructed prompt may include instructions, context, input data, and an output indicator.

New cards

Token

Words, part of words, or single characters. Breakdown of input text into vectors.

New cards

Embedding

Converting a token into a numerical representation (vector or embed) that can be mapped or clustered.

Vector captures its meaning and semantic relationships with other tokens.

New cards

Model

The artifact produced after training an ML algorithm on data. A model encodes learned patterns and is used during inference to generate predictions or outputs.

New cards

Labeled Data

Training data in which each example is tagged with the correct output or category, enabling supervised learning.

New cards

Unlabeled Data

Data without pre-assigned output tags, used in unsupervised learning where the model must discover its own patterns.

New cards

Structured Data

Data organized in a predefined, consistent format (e.g., relational databases, CSV files).

New cards

Unstructured Data

Data with no predefined format (e.g., images, audio, free-form text, video).

New cards

Basic ML Training Patterns (3)

Supervised Learning: Labeled Data > Classification, Regression
Unsupervised Learning: Unlabeled Data > Clustering, Anomaly Detection
Reinforcement Learning: Learn from environment > Positive/Negative Reinforcement

New cards

Supervised Learning

An ML approach where the model is trained on labeled data, learning to map inputs to known outputs (e.g., regression and classification tasks).

New cards

Unsupervised Learning

An ML approach where the model is trained on unlabeled data and must discover hidden patterns or groupings on its own (e.g., clustering).

New cards

Reinforcement Learning

An ML approach where an agent learns by interacting with an environment, receiving rewards or penalties based on its actions, and adjusting its behavior to maximize cumulative reward.

New cards

Anomaly Detection

An ML capability that learns expected patterns from historical data and then flags significant deviations as anomalies. Used in fraud detection, system monitoring, and quality control.

New cards

Bias (Model)

Systematic errors or skewed assumptions in a model that result from flawed training data or algorithm design, potentially leading to unfair or inaccurate outputs.

New cards

Fairness

The principle that an AI/ML model should not discriminate against individuals or groups based on protected attributes such as race, gender, or age.

New cards

Fit

How well a trained model's predictions match the actual data. Underfitting occurs when the model is too simple; overfitting when it memorizes training data but fails on new data.

New cards

Accuracy

A model performance metric that measures the proportion of correct predictions out of all predictions made.

New cards

F1 Score

A model performance metric that balances precision (correct positive predictions) and recall (actual positives correctly identified). Useful when class distributions are unequal.

New cards

Data Drift

A change in the statistical properties of input data over time compared to the data the model was originally trained on, which can degrade model performance.

New cards

Exploratory Data Analysis (EDA)

An early ML pipeline step where data scientists analyze and visualize data to understand its structure, distribution, and key characteristics before preprocessing or modeling.

New cards

ETL (Extract, Transform, Load)

A data pipeline process that extracts data from sources, transforms it into the appropriate format, and loads it into a destination (e.g., a data lake) for analysis or ML training.

New cards

Data Lake

A centralized storage repository that holds large volumes of raw, structured, and unstructured data until it is needed for processing or analytics.

New cards

Interpretability

The degree to which a human can understand and explain how and why a model makes specific predictions. Complex neural networks often have low interpretability.

New cards

Cost-Benefit Analysis

An evaluation conducted before adopting an AI/ML solution that weighs projected costs (data management, training, deployment) against potential gains (efficiency, revenue, savings).

New cards

Criteria for Responsible AI (3)

1. Accurate, fair, safe, and secure: keep data safe and outputs valid/fair
2. Explainable and transparent: can be audited/validated
3. Governed and controlled: best practices for governance throughout system

New cards

Natural Language Processing (NLP)

A field of AI that enables systems to understand, interpret, and generate human language. Core capabilities include text analysis, sentiment analysis, language translation, and conversational interfaces.

New cards

Sentiment Analysis

An NLP technique that identifies and classifies the emotional tone of text as positive, negative, or neutral.

New cards

Intent (NLP)

In a conversational AI system, the user's desired action derived from their spoken or typed input (e.g., booking a hotel room).

New cards

Fulfillment (NLP)

The step in an NLP workflow where the system carries out the requested action after identifying the intent and collecting all required slots.

New cards

Computer Vision

An AI field that enables machines to interpret and analyze images and videos—including object detection, facial recognition, text detection, and defect identification.

New cards

Speech Recognition (ASR)

Automatic Speech Recognition technology that converts spoken audio into text.

New cards

Text-to-Speech (TTS)

Technology that converts written text into natural-sounding spoken audio.

New cards

Predictive Analytics

The use of ML algorithms to identify patterns in historical data and generate forecasts or predictions about future events or behaviors.

New cards

GDPR

General Data Protection Regulation — a European Union law that sets strict requirements for the collection, storage, and use of personal data.

New cards

HIPAA

Health Insurance Portability and Accountability Act of 1996 — a US law that mandates privacy and security protections for protected health information.

New cards

Amazon Bedrock

AWS platform to build and scale generative AI applications using existing foundation models and integrate with AWS services

Customizable AWS AI service without requiring investment in infrastructure or training new models.

On-demand pricing charges by the number of input AND output tokens processed.

New cards

Amazon SageMaker AI

AWS's fully managed end-to-end ML platform. It provides tools for every stage of the ML pipeline: data preparation (Data Wrangler), feature management (Feature Store), model training, hyperparameter tuning, deployment, and monitoring (Model Monitor).

New cards

Amazon SageMaker Inference Mode: Batch Inference

Running a trained model against a large dataset offline, when real-time results are not required. Suited for datasets in the gigabyte range and long processing windows.

New cards

Amazon SageMaker Inference Mode: Real-Time Inference

Using a deployed model to generate live predictions with low latency for sustained or interactive traffic. Requires a persistent, fully managed endpoint.

New cards

Amazon SageMaker Inference Mode: Asynchronous Inference

An inference mode for requests with large payloads or long processing times. The endpoint automatically scales down to zero when there are no requests.

New cards

Amazon SageMaker Inference Mode: Serverless Inference

An inference mode in which the cloud provider (AWS Lambda) handles compute automatically; suited for intermittent or unpredictable traffic with no manual scaling required.

New cards

Amazon SageMaker Data Wrangler

A SageMaker feature that simplifies data preparation and feature engineering for ML. It enables importing, transforming, and analyzing data from multiple sources without writing code.

New cards

Amazon SageMaker Feature Store

A SageMaker feature that provides a centralized repository for storing, sharing, and reusing ML features across teams and models, ensuring consistency between training and inference.

New cards

Amazon SageMaker Model Monitor

A SageMaker feature that continuously monitors deployed ML models in production. It detects data quality issues, model quality degradation, bias drift, and feature attribution drift, and generates alerts via Amazon CloudWatch.

New cards

Amazon SageMaker Inference Recommender

A SageMaker tool that provides recommendations for the optimal instance type and configuration for hosting a model, based on performance and cost requirements.

New cards

Amazon SageMaker Studio

The integrated development environment (IDE) within SageMaker that provides a unified interface for accessing all SageMaker tools, including monitoring results, experiment tracking, and model management.

New cards

Amazon Rekognition

An AWS computer vision service that enables facial comparison and analysis, object detection and labeling (including custom labeling), text detection in images/videos, and content moderation.

New cards

Amazon Textract

An AWS AI service that uses computer vision and ML to extract text, handwriting, data, and layout elements (forms, tables) from scanned documents and images—going beyond standard OCR.

New cards

Amazon Comprehend

An AWS NLP service that analyzes text to extract key phrases, entities, sentiment, and topics. It can also detect Personal Identifiable Information (PII), supporting data privacy and compliance use cases.

New cards

Amazon Lex

An AWS service for building conversational interfaces (chatbots and voice assistants) using voice and text. It uses natural language understanding and automatic speech recognition to power customer service bots and IVR systems.

New cards

Amazon Transcribe

An AWS automatic speech recognition (ASR) service that converts speech to text, transcribing audio files or live audio streams. Supports batch and streaming transcription jobs, and custom vocabularies for domain-specific terms.

New cards

Amazon Polly

An AWS text-to-speech service that converts written text into lifelike speech across multiple languages and voices. Used for e-learning, audiobook narration, and IVR prompting.

New cards

Amazon Kendra

An AWS intelligent document search service powered by ML. It understands natural language queries and retrieves the most relevant results from document repositories, improving information discovery.

New cards

Amazon Personalize

An AWS service that delivers personalized product recommendations by analyzing customer behavior and preferences—the same technology used by Amazon.com for product recommendations.

New cards

Amazon Translate

An AWS neural machine translation service that translates text between 75+ languages, enabling multilingual communication and localized content delivery.

New cards

AWS Glue

A cloud-optimized, fully managed ETL service on AWS. It includes its own AWS Glue Data Catalog (centralized metadata repository) and built-in transformations for cleaning, enriching, and reshaping data for analytics or ML workloads.

New cards

AWS Glue Data Catalog

A centralized metadata repository integrated within AWS Glue that stores and manages information about data sources, schemas, and transformations, making data easier to discover and govern.

New cards

Amazon Kinesis Data Streams

An AWS real-time data streaming service that captures and processes large streams of data records, often used as a data source feeding into ETL pipelines (e.g., via AWS Glue).

New cards

Amazon S3 (Simple Storage Service)

AWS's core object storage service. In ML workflows, S3 is used to store model artifacts (trained model files), training datasets, and processed data lakes.

New cards

Amazon ECR (Elastic Container Registry)

An AWS fully managed container image registry. In SageMaker deployments, ECR stores the container image that packages a model's code and dependencies.

New cards

AWS Lambda

A serverless compute service on AWS. Used in SageMaker Serverless Inference to execute model inference on-demand without provisioning or managing servers.

New cards

Amazon EC2 (Elastic Compute Cloud)

AWS's virtual server service. While EC2 provides infrastructure for custom solutions, it requires manual management—contrasting with fully managed AI/ML services that minimize operational overhead.

Custom designed to optimize performance, cost, and sustainability for AI workloads.

New cards

AWS Cost Explorer

An AWS tool for monitoring and analyzing costs associated with cloud services. In AI/ML contexts, it uses cost allocation tags to track and attribute expenses to specific ML projects for ROI analysis.

New cards

Amazon CloudWatch

AWS's monitoring and observability service. SageMaker Model Monitor integrates with CloudWatch to trigger alerts and automated corrective actions (such as model retraining) when violations are detected.

New cards

ML Pipeline

A structured, end-to-end process that guides the development and deployment of ML models. The stages are: (1) Identify business goal → (2) Frame ML problem → (3) Collect data → (4) Pre-process data → (5) Engineer features → (6) Train, tune & evaluate → (7) Deploy → (8) Monitor.

New cards

ML Development Lifecycle

The overarching framework covering all stages of planning, building, deploying, and maintaining ML solutions. It ensures all necessary steps are taken and best practices followed throughout the ML project.

New cards

Problem Definition (ML Pipeline Stage)

The first stage of the ML pipeline. It involves identifying the business goal, setting clear success criteria, aligning stakeholders, framing the ML task (inputs, desired outputs, evaluation metrics), and conducting a feasibility and cost-benefit analysis.

New cards

Data Collection (ML Pipeline Stage)

The stage where relevant and high-quality training data is gathered from various sources through an ETL process and properly labeled for supervised learning tasks.

New cards

Data Pre-Processing (ML Pipeline Stage)

The stage that begins with EDA (Exploratory Data Analysis) to understand the data, followed by data cleaning (handling missing/inconsistent values) and splitting data into training, validation, and test sets.

New cards

Feature Engineering (ML Pipeline Stage)

The stage focused on selecting the most relevant variables from the data and transforming or creating new features to maximize ML model performance and minimize error rate.

New cards

Model Training (ML Pipeline Stage)

The stage where ML algorithms are applied to the prepared, feature-engineered data. The algorithm iteratively adjusts parameters to identify patterns and minimize prediction error.

New cards

Model Evaluation (ML Pipeline Stage)

The stage where a trained model's performance is measured using defined metrics (accuracy, AUC, F1 score). Models are optimized using techniques such as hyperparameter tuning.

New cards

Model Deployment (ML Pipeline Stage)

The stage where a satisfactory model is moved to a production environment, where it can serve real-time predictions or batch outputs integrated with applications or systems.

New cards

Model Monitoring (ML Pipeline Stage)

The final stage of the ML pipeline, involving continuous tracking of a deployed model's accuracy, data drift, bias drift, and feature attribution drift to ensure ongoing reliability and performance.

New cards

Train/Validate/Test Split

A data preprocessing step that partitions a dataset into three subsets: training (used to fit the model), validation (used to tune hyperparameters), and test (used for final performance evaluation on unseen data).