AWS SageMaker and Machine Learning

0.0(0)

Studied by 0 people

View linked note

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/86

Earn XP

Description and Tags

Flashcards from lecture notes on AWS SageMaker and Machine Learning

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

87 Terms

New cards

Amazon SageMaker Pipelines

A deployment orchestrator for managing and automating ML workflows, offering built-in integration with SageMaker features, scalability, monitoring, and versioning.

New cards

AWS Glue

Service used to create an ETL job for exporting data from DynamoDB to Amazon S3, ensuring scalability and seamless integration with SageMaker.

New cards

Amazon EFS

An ideal solution for providing a scalable and shared file system that supports distributed ML training jobs, integrating seamlessly with AWS services.

New cards

Data Augmentation, Early Stopping, and Ensembling

Techniques that address overfitting and limited data by increasing data diversity, preventing overfitting during training, and enhancing robustness through multiple model predictions.

New cards

AWS Step Functions

Service used to orchestrate the translation process with Amazon Translate, managing complex workflows and ensuring each step is correctly handled.

New cards

Amazon SageMaker endpoints

Managed infrastructure from Amazon Simple Email Service (SES) for real-time, low-latency inference, ideal for latency-sensitive applications.

New cards

Bias versus Variance Trade-off

A fundamental concept in machine learning involving balancing bias with variance to ensure models generalize well to unseen data.

New cards

Amazon Comprehend

Built-in capabilities for processing and analyzing text at scale, including sentiment analysis, key phrase extraction, and entity recognition.

New cards

"class_weights" Parameter

A parameter in the Linear Learner algorithm in Amazon SageMaker that can be used to adjust the importance of different classes, suitable for handling imbalanced datasets.

New cards

AWS CloudTrail

Service used to log all SageMaker API calls, ensuring comprehensive traceability and security for model re-training activities.

New cards

Amazon SageMaker Autopilot

Service from Amazon SageMaker to automatically explore different algorithms and hyperparameters to improve model performance.

New cards

AWS CodePipeline with SageMaker model building pipelines

An automated, scalable, and efficient solution for implementing CI/CD workflows in machine learning projects, seamlessly integrating training, validation, and deployment.

New cards

Target Tracking Scaling

Policy for handling dynamic workloads by continuously monitoring metrics and adjusting resources accordingly, ensuring a balance between performance and cost optimization.

New cards

p4d Instances

Amazon SageMaker instances powered by NVIDIA A100 GPUs, offering high computational power needed for training large-scale deep learning models.

New cards

ml.inf1 Instances

Amazon SageMaker instances equipped with AWS Inferentia chips, providing low-latency inference, critical for real-time clinical applications.

New cards

Precision and Recall

Metrics that highlight the model's performance on the minority class in imbalanced datasets, providing a detailed breakdown of true positives, false positives, true negatives, and false negatives.

New cards

Amazon EC2 P3 instances

Amazon EC2 instances that provide powerful GPU acceleration, which can significantly reduce training times and improve model performance for compute-intensive tasks.

New cards

Amazon SageMaker Model Monitor

Automated tools in Amazon SageMaker that identify performance issues such as data and model drift.

New cards

Amazon Augmented AI (A2I)

AWS service that facilitates human review for tasks requiring additional validation or judgment, enhancing the reliability and accountability of ML systems.

New cards

Pipe Input Mode

An input mode that streams data directly from Amazon S3 to the training instances, reducing storage costs and I/O bottlenecks.

New cards

AWS CodePipeline

Service to automate the build, test, and deployment stages, ensuring a robust and automated CI/CD pipeline for machine learning models.

New cards

AWS CodeBuild

Service used with a custom Docker image to ensure a robust and automated CI/CD pipeline for machine learning models.

New cards

Amazon Kinesis Data Streams and Kinesis Data Analytics

Solutions for scalable, near-real-time querying with minimal data loss, providing durable, scalable ingestion, and real-time querying and analysis of ingested data.

New cards

Amazon SageMaker Data Wrangler

Service used to ingest and transform data, integrating seamlessly with SageMaker Feature Store for storing and managing engineered features for future model training.

New cards

High Recall

Ensures that most of the actual positive cases are identified, minimizing the risk of missing patients who need treatment.

New cards

Amazon Comprehend

A managed natural language processing service that enables the analysis of text for sentiment, entities, and key topics, scaling to handle large datasets.

New cards

Amazon FSx for Lustre

Provides that high throughput and low-latency access to datasets, especially for compute-intensive ML training.

New cards

SageMaker Clarify

Service used to measure and mitigate bias in both input data and predictions, ensuring that the model meets fairness requirements.

New cards

AWS Compute Optimizer

Identifies inefficiently used resources and deliver actionable recommendations to optimize costs with minimal development effort.

New cards

Storing model artifacts and data in Amazon S3 with versioning, implementing a microservices-based architecture

Used with Amazon SageMaker endpoints, and using infrastructure as code (IaC) to create a maintainable, scalable, and cost-effective ML infrastructure.

New cards

AWS Lake Formation

Service used to define fine-grained access control, tagging datasets with category-specific metadata and assigning permissions based on those tags.

New cards

AWS Inferentia Accelerators

Provides low-latency, high-throughput inference for production at a lower cost compared to GPUs.

New cards

Amazon EventBridge

Services or Amazon S3 Event Notifications should be used to automate the SageMaker pipeline trigger

New cards

K-Means

Unsupervised learning algorithm used for clustering data points into groups based on their similarities.

New cards

K-Nearest Neighbors (KNN)

Supervised learning algorithm used for classification, where data points are classified based on their proximity to labeled examples.

New cards

Amazon SageMaker Clarify

Service designed to provide explanations for model predictions including model decisions

New cards

Tagging the SageMaker User profiles and using AWS Budgets

Ensure transparency and proactive cost management.

New cards

Bias versus Variance Trade-Off

Involves balancing the error due to the models complexity (variance) and the error due to incorrect assumptions in the model (bias).

New cards

Network Access Control List (NACL)

Creates a network ACL (NACL) for the subnet hosting the SageMaker training job and add a deny rule to block traffic from the malicious IP address.

New cards

Deploying the model using Amazon SageMaker Serverless Inference with provisioned concurrency is the most

Cost-effective solution for the logistics company.

New cards

L1 Regularization

Also known as Lasso, is the preferred method in this scenario because it can shrink some feature coefficients to zero, effectively performing feature selection.

New cards

Data Drift

Refers to changes in the distribution of input data over time, which can impact model predictions..

New cards

Model Drift

Occurs when a model's performance degrades due to outdated assumptions or parameters.

New cards

Amazon EMR with Apache Spark in Amazon SageMaker Studio

Service to processes and transforms large-scale distributed datasets and integrates seamlessly with SageMaker Studio.

New cards

Amazon ECS

The best choice for deploying containerized ML models for batch inference due to its simplicity, managed environment, and deep integration with other AWS services like S3.

New cards

F1 Score

Balances the trade-off between precision and recall, which is important for considering both false positives and false negatives.

New cards

Amazon ECR

Service used to ensure the security, version control, and accessibility of container images

New cards

DeepAR Algorithm

Designed for forecasting time series data.

New cards

Storing the API tokens in AWS Secrets Manager

Service that secures API tokens.

New cards

Switching to XGBoost

With Bayesian optimization for hyperparameter tuning.

New cards

AWS SDK for Python (Boto3)

Service that queries DynamoDB is the most efficient method for accessing data in real-time from an Amazon SageMaker notebook.

New cards

Model parameters

Internal values that the model learns during the training process to adapt its behavior to the data.

New cards

Hyperparameters

External values set before training begins

New cards

AWS CodeBuild

Service used to automating the build, and artifact storage.

New cards

Hybrid Model

Model that Maximizes the strengths of each model, ensuring accurate and personalized recommendations across all customer segments.

New cards

Attaching an IAM role with the required permissions or explicitly including the IAM role's ARN in the KMS key policy

Ensures secure and straightforward access to encrypted S3 data.

New cards

Amazon Kinesis Data Streams

Used with Firehose and Lambda to ingest real-time data, load the data into Amazon S3, and process the data and generate recommendations

New cards

Deploying Amazon CloudFront

Ensure content can be served with minimal latency

New cards

Increasing the batch size

improves computational efficiency by leveraging hardware optimizations and reduces the number of epochs

New cards

Creating a private repository in Amazon ECR,

Is needed to provide scalable and secure environment for storing and managing container images.

New cards

Amazon Transcribe

Service used to improve the accuracy of transcriptions for industry-specific terms

New cards

F1 score and AUC-ROC

balanced assessment of model performance, particularly for imbalanced datasets.

New cards

Amazon Personalize

Recommendation Systems is purpose-built which dynamically adapts to user interactions.

New cards

Amazon EMR with Apache Spark Streaming and Amazon Managed Service for Apache Flink

Provides low-latency solutions while reducing operational overhead, making them ideal choices for generating real-time engagement metrics from streaming data.

New cards

Redshift Spectrum

Ensure optimal performance and efficient data preparation.

New cards

AWS Step Functions

Approach that leverages Step Functions capabilities for managing complex workflows.

New cards

XGBoost

balances accuracy and interpretability.

New cards

A combination of pruning and quantization.

Technique used eliminate less significant components of the model, reducing its complexity, while quantization decreases the precision of the weights

New cards

Configuring Amazon EFS with General Purpose performance mode and Bursting Throughput mode

scalable and shared file storage with optimal performance for high IOPS and throughput requirements in a scenario where multiple SageMaker instances access large genomic datasets concurrently.

New cards

temperature and top-K parameters in the Amazon Bedrock API

Reduces randomness during token selection. Additionally, fine-tuning the model with domain- specific data improves its accuracy and contextual relevance.

New cards

Amazon Polly Neural TTS

More natural and lifelike speech, while custom lexicons allow for precise pronunciation customization.

New cards

seamless updates, robust security, and efficient management of the containerized ML solution

approach combines Amazon ECR for secure and scalable container image storage, Amazon EKS for container orchestration across different environments, and AWS CodePipeline for automating the CI/CD pipeline.

New cards

Deploying a lightweight model using AWS Lambda with API Gateway

scalable architecture that Minimizes costs by eliminating idle resource charges and reduces infrastructure management overhead.

New cards

using Amazon Kinesis Firehose in combination with AWS Lambda

the seamless integration of AWS services to effectively handle large-scale, real-time data processing and simplify the flow without managing infrastructure directly.

New cards

Amazon SageMaker, Amazon Cognito, and Amazon API Gateway

robust platform for secure data handling, authenticated access, and efficient, scalable real- time predictions necessary in healthcare applications

New cards

AWS Lambda, triggered by Amazon S3 event notifications, and Amazon Comprehend Medical for medical data extraction

Services that allows for fully managed, serverless workflow that automatically scales, ensuring efficient processing and storage of extracted data in DynamoDB

New cards

Amazon Textract, AWS Lambda, Amazon DynamoDB and AWS Step Functions

effectively automates the workflow, ensuring smooth data processing and management

New cards

Model Compression

Helps in balancing between the model's complexity and the computational limitations, ensuring that the model remains functional for real-time inferences

New cards

The optimal solution involves automating the data preprocessing workflow with AWS Glue DataBrew, handling event-driven updates efficiently, and directly integrating with Amazon QuickSight

effective approach for prompt and efficient data analysis

New cards

AWS CloudWatch

facilitating timely notifications about issues.

New cards

Leverages serverless computing

architectural setup efficiently handles real-time, scalable translations for the online news portal.

New cards

Amazon Redshift Spectrum

combines the power of Redshift's query processing with the ability to query directly over the data stored in S3, managed under Lake Formation's security and compliance policies

New cards

Amazon CloudWatch

offering extensive metrics and logs that can help in identifying and addressing deployment issues, performance bottlenecks, and ensuring that the application meets its performance expectations

New cards

Amazon SageMaker endpoint variants

model management and updates

New cards

ensuring the model adapts in real-time to new user behaviors and seasonal trends is crucial for maintaining relevance and user satisfaction

a lightweight model like logistic regression helps in keeping the computational costs down, meeting the requirements for both real-time adaptation and cost efficiency.

New cards

Amazon Redshift, AWS Batch, and Amazon Forecast

Used to perform retail inventory tasks.

New cards

DynamoDB

Offers quick read and write capabilities.