Yacine AWS AI Practicioner - Section 5: Amazon SageMaker

0.0(0)

Studied by 0 people

Call Kai

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Knowt Play

Card Sorting

1/34

There's no tags or description

Looks like no tags are added yet.

Last updated 10:29 AM on 4/7/26

Name	Mastery	Learn	Test	Matching	Spaced	Call with Kai	Chat

No analytics yet

Send a link to your students to track their progress

35 Terms

New cards

1. What is Amazon SageMaker?

Amazon SageMaker is a fully managed machine learning platform that allows teams to build, train, tune, and deploy custom ML models at scale. It provides end-to-end tooling across the entire ML lifecycle (collect and prep data, build and train, deploy).

New cards

2. What problem does SageMaker solve?

SageMaker reduces the operational complexity of running ML workloads by providing managed infrastructure, built-in algorithms, and integrated tools for data preparation, training, deployment, and monitoring.

New cards

3. When is Amazon SageMaker Studio Data Wrangler?

Prepare tabular and image data for ML.

Data preparation, transformation, and feature engineering. Single interface for data importation, view data, visualize data in graphs, transform, selection, cleansing, exploration, exportation, and processing. SQL support, Data Quality tool.

Fix bias by balancing the dataset. Ex: Augment the data (generate new instances of data for underrepresented groups)

New cards

4. What is SageMaker Studio Feature Store?

Ingests features from a variety of sources. Ability to define the transformation of data into features from within the Feature Store.

Can publish directly from SageMaker Data Wrangler into SageMaker Feature Store. Features are discoverable within SageMaker Studio.

New cards

5. When should Amazon SageMaker be used?

Use SageMaker when you need full control over model training, custom algorithms, custom architectures, or large-scale ML pipelines that go beyond simple model consumption.

New cards

6. What is SageMaker Clarify?

- Evaluate Foundation Models.

- FM evaluation on accuracy, robustness, toxicity

- Bias detection (ex: data skewed towards middle-aged people)

- Evaluating human-factors such as friendliness or humor

- Leverage an AWS-managed team or bring your own employees

- Use built-in datasets or bring your own dataset

- Built-in metrics and algorithms

- Part of SageMaker Studio

New cards

7. What is SageMaker Clarify Model Explainability?

A set of tools to help explain how ML models make predictions. Debug predictions provided by the model after it's deployed. Helps increase the trust and understanding of the model.

New cards

8. What is SageMaker Clarify Detect Bias?

Ability to detect and explain human biases in your datasets and models. Measure bias using statistical metrics. Specify input features, what “bias” means for you, and bias will be automatically detected.

1. Sampling bias: the training data does not represent the full population fairly, disproportionately affects certain groups

2. Measurement bias: tools or measurements used in data collection are flawed

3. Observer bias: the person collecting or interpreting the data has personal biases

4. Confirmation bias: individuals interpret or favor information that confirms their preconceptions

You need to perform data augmentation for imbalanced classes.

New cards

9. What is SageMaker Ground Truth?

SageMaker Ground Truth helps you label training data at scale using human labelers (RLHF), automated labeling, or a combination of both.

automated labeling:

1. You provide raw unlabeled data

2. Humans label a small subset

3. Ground Truth trains a model

4. The model labels easy examples automatically

5. Humans review hard or low-confidence cases

New cards

10. What is SageMaker Ground Truth Plus?

SageMaker Ground Truth Plus provides expert-managed data labeling projects, where AWS handles workforce selection, workflow design, quality control, and delivery of labeled data.

New cards

11. What is SageMaker ML Governance?

SageMaker Model Cards

- Essential model information

- Example: intended uses, risk ratings, and training details

SageMaker Model Dashboard

- Centralized portal where you can view, search, and explore all of your models

- Accessible through the SageMaker Console

- Helps you find models that violate thresholds you set for data quality, model quality, bias, explainability…

SageMaker Model Monitor (part of Model Dashboards)

- Monitor the quality of your model in production: continuous or on-schedule

- Alerts for deviations in the model quality: fix data & retrain model

- Example: loan model starts giving loans to people who don’t have the correct credit score (drift)

SageMaker Role Manager

- Define roles for personas

- Example: data scientists, MLOps engineers

SageMaker Model Registry

- Centralized repository allows you to track, manage, and version ML models

- Catalog models, manage model versions, associate metadata with a model

- Manage approval status of a model, automate model deployment, share models…

New cards

12. What are Amazon SageMaker Pipelines?

Workflow that automates the process of building, training, and deploying a ML model. Continuous Integration and Continuous Delivery (CI/CD) service for ML.

Supported Step Types:

- Processing – for data processing (e.g., feature engineering)

- Training – for training a model

- Tuning – for hyperparameter tuning (e.g., Hyperparameter Optimization)

- AutoML – to automatically train a model

- Model – to create or register a SageMaker model

- ClarifyCheck – perform drift checks against baselines (Data bias, Model bias, Model explainability)

- QualityCheck – perform drift checks against baselines (Data quality, Model quality

New cards

13. What are Amazon SageMaker built-in algorithms?

- Supervised Algorithms (linear regressions and classifications, KNN Algorithms)

- Unsupervised Algorithms (K-mean, Anomaly Detection)

- Textual Algorithms (NLP, summarization)

- Image Processing (classification, detection)

New cards

14. When should SageMaker NOT be the first choice?

SageMaker is not ideal when you only need to consume or lightly customize foundation models. In those cases, Amazon Bedrock is simpler, faster, and more cost-effective.

New cards

15. What is the relationship between SageMaker and Amazon Bedrock?

Bedrock focuses on using and customizing foundation models, while SageMaker focuses on building and training models from scratch or with deep customization.

New cards

16. What is model training in SageMaker?

Model training in SageMaker involves running training jobs on managed compute instances using your data and algorithms.

SageMaker handles provisioning, scaling, and teardown of infrastructure.

New cards

17. What is inference in SageMaker?

Inference is the process of using a trained model to make predictions. SageMaker supports:

real-time inference: milliseconds to second, up to 6 MB, processing 60 sec max, for near-instant predictions

serverless inference: milliseconds to second, up to 4 MB, processing 60 sec max, short-term inference, idle period between traffic that can tolerate cold starts

asynchronous inference: near real-time, up to 1 GB, max, processing 1 hour, large payloads/workloads requiring a longer processing times

batch inference: minutes to hours, up to 100MB per invocation/mini batch/ max 1 hour, Bulk processing for large datasets, concurrent processing

New cards

18. What is a SageMaker endpoint?

A SageMaker endpoint is a managed, scalable HTTPS endpoint used to host models for real-time inference. It automatically scales based on traffic.

New cards

19. What is batch inference in SageMaker?

Batch inference processes large volumes of data asynchronously. It is used when real-time responses are not required, such as nightly predictions or large data transformations.

New cards

20. What is SageMaker Studio?

SageMaker Studio is an end-to-end ML development from a unified interface, team collaboration, tune and debug ML models, deploy ML models, automated workflows.

Amazon SageMaker

- SageMaker Studio

1. Data Wrangler

2. Notebooks

3. Experiments

4. Pipelines

5. Feature Store

New cards

21. What is SageMaker Autopilot?

SageMaker Autopilot automatically builds, trains, and tunes ML models from tabular data. It is designed for users who want ML results without deep algorithm knowledge.

New cards

22. What is Automatic Model Tuning (AMT) tuning in SageMaker?

Define the Objective Metric. AMT automatically chooses hyperparameter ranges, search strategy, maximum runtime of a tuning job, and early stop condition.

Saves you time and money.

Hyperparameter != Objective metric (accuracy, F1, validation loss).

New cards

23. What is SageMaker JumpStart?

SageMaker JumpStart provides pre-built models, solution templates, and notebooks. You can customize the models. Models are deployed on SageMaker directly.

ML Hub: A catalog of ready-to-use ML models, algorithms, and foundations models that you can deploy or fine-tune. Think: models you start from.

ML Solutions: End-to-end, opinionated ML application templates that solve a specific business problem. Think: complete solutions, not just models.

New cards

24. What is SageMaker Canvas?

Build ML models using a visual interface (no coding required).

Access to ready-to-use models from Bedrock or JumpStart.

Build your own custom model using AutoML powered by SageMaker Autopilot.

Part of SageMaker Studio.

Leverage Data Wrangler for data preparation/transformation.

Ready-to-use models leveraging Amazon Rekognition, Amazon Comprehend, Amazon Textract.

Makes it easy to build a full ML pipeline without writing code and leveraging various AWS AI Services.

New cards

24.1 What is the difference between SM Canvas, PartyRock, and Q Apps?

Canvas builds models, not UI App

Canvas: “Low-code data scientist for tabular ML tasks.”

PartyRock: “Playground to vibe-code AI apps.”

Q Apps: “Company-specific AI assistant apps built for real use.”

New cards

25. What is MLFlow?

It’s on Amazon SageMaker. It’s an open-source tool which helps ML teams manage the entire ML lifecycle.

MLFlow Tracking Servers:

- Used to track runs (record model training attempt) and experiments

- Launch on SageMaker with a few clicks

New cards

26. What are some SageMaker extra features?

Network Isolation mode:

- Run SageMaker job containers without any outbound internet access

- Can’t even access Amazon S3

SageMaker DeepAR forecasting algorithm:

- Used to forecast time series data

- Leverages Recurrent Neural Network (RNN)

New cards

27. BONUS SageMaker

- SageMaker: end-to-end ML service

- SageMaker Automatic Model Tuning: tune hyperparameters

- SageMaker Deployment & Inference: real-time, serverless, batch, async

- SageMaker Studio: unified interface for SageMaker

- SageMaker Data Wrangler: explore and prepare datasets, create features

- SageMaker Feature Store: store features metadata in a central place

- SageMaker Clarify: compare models, explain model outputs, detect bias

- SageMaker Ground Truth: RLHF, humans for model grading and data labeling

- SageMaker Model Cards: ML model documentation

- SageMaker Model Dashboard: view all your models in one place

- SageMaker Model Monitor: monitoring and alerts for your model

- SageMaker Model Registry: centralized repository to manage ML model versions

- SageMaker Pipelines: CI/CD for Machine Learning

- SageMaker Role Manager: access control

- SageMaker JumpStart: ML model hub & pre-built ML solutions

- SageMaker Canvas: no-code interface for SageMaker

- MLFlow on SageMaker: use MLFlow tracking servers on AWS

New cards

28. How does SageMaker handle scaling?

SageMaker automatically scales training jobs and inference endpoints. Users can define minimum and maximum instance counts to control performance and cost.

New cards

29. How does SageMaker integrate with S3?

Amazon S3 is used to store training data, model artifacts, and inference outputs.

SageMaker reads from and writes to S3 as part of the ML workflow.

New cards

30. What level of ML expertise is required for SageMaker?

SageMaker typically requires moderate to advanced ML knowledge, especially when building custom models or pipelines.

It is more complex than managed AI services or Bedrock.

New cards

31. What is MLOps in the context of SageMaker?

MLOps refers to practices for deploying, monitoring, retraining, and managing ML models in production.

SageMaker supports MLOps through pipelines, monitoring, and versioning.

New cards

32. What is SageMaker Model Monitor?

Model Monitor detects data drift and model quality issues by comparing live inference data with baseline training data.

New cards

33. Is SageMaker serverless?

Parts of SageMaker are serverless, but many components require provisioning instances.

It offers more control but also more operational responsibility than fully serverless AI services.

New cards

34. Why is SageMaker considered “heavier” than Bedrock?

SageMaker exposes more of the ML lifecycle, infrastructure choices, and configuration options. This power comes with increased complexity and operational overhead.