1/86
Flashcards from lecture notes on AWS SageMaker and Machine Learning
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Amazon SageMaker Pipelines
A deployment orchestrator for managing and automating ML workflows, offering built-in integration with SageMaker features, scalability, monitoring, and versioning.
AWS Glue
Service used to create an ETL job for exporting data from DynamoDB to Amazon S3, ensuring scalability and seamless integration with SageMaker.
Amazon EFS
An ideal solution for providing a scalable and shared file system that supports distributed ML training jobs, integrating seamlessly with AWS services.
Data Augmentation, Early Stopping, and Ensembling
Techniques that address overfitting and limited data by increasing data diversity, preventing overfitting during training, and enhancing robustness through multiple model predictions.
AWS Step Functions
Service used to orchestrate the translation process with Amazon Translate, managing complex workflows and ensuring each step is correctly handled.
Amazon SageMaker endpoints
Managed infrastructure from Amazon Simple Email Service (SES) for real-time, low-latency inference, ideal for latency-sensitive applications.
Bias versus Variance Trade-off
A fundamental concept in machine learning involving balancing bias with variance to ensure models generalize well to unseen data.
Amazon Comprehend
Built-in capabilities for processing and analyzing text at scale, including sentiment analysis, key phrase extraction, and entity recognition.
"class_weights" Parameter
A parameter in the Linear Learner algorithm in Amazon SageMaker that can be used to adjust the importance of different classes, suitable for handling imbalanced datasets.
AWS CloudTrail
Service used to log all SageMaker API calls, ensuring comprehensive traceability and security for model re-training activities.
Amazon SageMaker Autopilot
Service from Amazon SageMaker to automatically explore different algorithms and hyperparameters to improve model performance.
AWS CodePipeline with SageMaker model building pipelines
An automated, scalable, and efficient solution for implementing CI/CD workflows in machine learning projects, seamlessly integrating training, validation, and deployment.
Target Tracking Scaling
Policy for handling dynamic workloads by continuously monitoring metrics and adjusting resources accordingly, ensuring a balance between performance and cost optimization.
p4d Instances
Amazon SageMaker instances powered by NVIDIA A100 GPUs, offering high computational power needed for training large-scale deep learning models.
ml.inf1 Instances
Amazon SageMaker instances equipped with AWS Inferentia chips, providing low-latency inference, critical for real-time clinical applications.
Precision and Recall
Metrics that highlight the model's performance on the minority class in imbalanced datasets, providing a detailed breakdown of true positives, false positives, true negatives, and false negatives.
Amazon EC2 P3 instances
Amazon EC2 instances that provide powerful GPU acceleration, which can significantly reduce training times and improve model performance for compute-intensive tasks.
Amazon SageMaker Model Monitor
Automated tools in Amazon SageMaker that identify performance issues such as data and model drift.
Amazon Augmented AI (A2I)
AWS service that facilitates human review for tasks requiring additional validation or judgment, enhancing the reliability and accountability of ML systems.
Pipe Input Mode
An input mode that streams data directly from Amazon S3 to the training instances, reducing storage costs and I/O bottlenecks.
AWS CodePipeline
Service to automate the build, test, and deployment stages, ensuring a robust and automated CI/CD pipeline for machine learning models.
AWS CodeBuild
Service used with a custom Docker image to ensure a robust and automated CI/CD pipeline for machine learning models.
Amazon Kinesis Data Streams and Kinesis Data Analytics
Solutions for scalable, near-real-time querying with minimal data loss, providing durable, scalable ingestion, and real-time querying and analysis of ingested data.
Amazon SageMaker Data Wrangler
Service used to ingest and transform data, integrating seamlessly with SageMaker Feature Store for storing and managing engineered features for future model training.
High Recall
Ensures that most of the actual positive cases are identified, minimizing the risk of missing patients who need treatment.
Amazon Comprehend
A managed natural language processing service that enables the analysis of text for sentiment, entities, and key topics, scaling to handle large datasets.
Amazon FSx for Lustre
Provides that high throughput and low-latency access to datasets, especially for compute-intensive ML training.
SageMaker Clarify
Service used to measure and mitigate bias in both input data and predictions, ensuring that the model meets fairness requirements.
AWS Compute Optimizer
Identifies inefficiently used resources and deliver actionable recommendations to optimize costs with minimal development effort.
Storing model artifacts and data in Amazon S3 with versioning, implementing a microservices-based architecture
Used with Amazon SageMaker endpoints, and using infrastructure as code (IaC) to create a maintainable, scalable, and cost-effective ML infrastructure.
AWS Lake Formation
Service used to define fine-grained access control, tagging datasets with category-specific metadata and assigning permissions based on those tags.
AWS Inferentia Accelerators
Provides low-latency, high-throughput inference for production at a lower cost compared to GPUs.
Amazon EventBridge
Services or Amazon S3 Event Notifications should be used to automate the SageMaker pipeline trigger
K-Means
Unsupervised learning algorithm used for clustering data points into groups based on their similarities.
K-Nearest Neighbors (KNN)
Supervised learning algorithm used for classification, where data points are classified based on their proximity to labeled examples.
Amazon SageMaker Clarify
Service designed to provide explanations for model predictions including model decisions
Tagging the SageMaker User profiles and using AWS Budgets
Ensure transparency and proactive cost management.
Bias versus Variance Trade-Off
Involves balancing the error due to the models complexity (variance) and the error due to incorrect assumptions in the model (bias).
Network Access Control List (NACL)
Creates a network ACL (NACL) for the subnet hosting the SageMaker training job and add a deny rule to block traffic from the malicious IP address.
Deploying the model using Amazon SageMaker Serverless Inference with provisioned concurrency is the most
Cost-effective solution for the logistics company.
L1 Regularization
Also known as Lasso, is the preferred method in this scenario because it can shrink some feature coefficients to zero, effectively performing feature selection.
Data Drift
Refers to changes in the distribution of input data over time, which can impact model predictions..
Model Drift
Occurs when a model's performance degrades due to outdated assumptions or parameters.
Amazon EMR with Apache Spark in Amazon SageMaker Studio
Service to processes and transforms large-scale distributed datasets and integrates seamlessly with SageMaker Studio.
Amazon ECS
The best choice for deploying containerized ML models for batch inference due to its simplicity, managed environment, and deep integration with other AWS services like S3.
F1 Score
Balances the trade-off between precision and recall, which is important for considering both false positives and false negatives.
Amazon ECR
Service used to ensure the security, version control, and accessibility of container images
DeepAR Algorithm
Designed for forecasting time series data.
Storing the API tokens in AWS Secrets Manager
Service that secures API tokens.
Switching to XGBoost
With Bayesian optimization for hyperparameter tuning.
AWS SDK for Python (Boto3)
Service that queries DynamoDB is the most efficient method for accessing data in real-time from an Amazon SageMaker notebook.
Model parameters
Internal values that the model learns during the training process to adapt its behavior to the data.
Hyperparameters
External values set before training begins
AWS CodeBuild
Service used to automating the build, and artifact storage.
Hybrid Model
Model that Maximizes the strengths of each model, ensuring accurate and personalized recommendations across all customer segments.
Attaching an IAM role with the required permissions or explicitly including the IAM role's ARN in the KMS key policy
Ensures secure and straightforward access to encrypted S3 data.
Amazon Kinesis Data Streams
Used with Firehose and Lambda to ingest real-time data, load the data into Amazon S3, and process the data and generate recommendations
Deploying Amazon CloudFront
Ensure content can be served with minimal latency
Increasing the batch size
improves computational efficiency by leveraging hardware optimizations and reduces the number of epochs
Creating a private repository in Amazon ECR,
Is needed to provide scalable and secure environment for storing and managing container images.
Amazon Transcribe
Service used to improve the accuracy of transcriptions for industry-specific terms
F1 score and AUC-ROC
balanced assessment of model performance, particularly for imbalanced datasets.
Amazon Personalize
Recommendation Systems is purpose-built which dynamically adapts to user interactions.
Amazon EMR with Apache Spark Streaming and Amazon Managed Service for Apache Flink
Provides low-latency solutions while reducing operational overhead, making them ideal choices for generating real-time engagement metrics from streaming data.
Redshift Spectrum
Ensure optimal performance and efficient data preparation.
AWS Step Functions
Approach that leverages Step Functions capabilities for managing complex workflows.
XGBoost
balances accuracy and interpretability.
A combination of pruning and quantization.
Technique used eliminate less significant components of the model, reducing its complexity, while quantization decreases the precision of the weights
Configuring Amazon EFS with General Purpose performance mode and Bursting Throughput mode
scalable and shared file storage with optimal performance for high IOPS and throughput requirements in a scenario where multiple SageMaker instances access large genomic datasets concurrently.
temperature and top-K parameters in the Amazon Bedrock API
Reduces randomness during token selection. Additionally, fine-tuning the model with domain- specific data improves its accuracy and contextual relevance.
Amazon Polly Neural TTS
More natural and lifelike speech, while custom lexicons allow for precise pronunciation customization.
seamless updates, robust security, and efficient management of the containerized ML solution
approach combines Amazon ECR for secure and scalable container image storage, Amazon EKS for container orchestration across different environments, and AWS CodePipeline for automating the CI/CD pipeline.
Deploying a lightweight model using AWS Lambda with API Gateway
scalable architecture that Minimizes costs by eliminating idle resource charges and reduces infrastructure management overhead.
using Amazon Kinesis Firehose in combination with AWS Lambda
the seamless integration of AWS services to effectively handle large-scale, real-time data processing and simplify the flow without managing infrastructure directly.
Amazon SageMaker, Amazon Cognito, and Amazon API Gateway
robust platform for secure data handling, authenticated access, and efficient, scalable real- time predictions necessary in healthcare applications
AWS Lambda, triggered by Amazon S3 event notifications, and Amazon Comprehend Medical for medical data extraction
Services that allows for fully managed, serverless workflow that automatically scales, ensuring efficient processing and storage of extracted data in DynamoDB
Amazon Textract, AWS Lambda, Amazon DynamoDB and AWS Step Functions
effectively automates the workflow, ensuring smooth data processing and management
Model Compression
Helps in balancing between the model's complexity and the computational limitations, ensuring that the model remains functional for real-time inferences
The optimal solution involves automating the data preprocessing workflow with AWS Glue DataBrew, handling event-driven updates efficiently, and directly integrating with Amazon QuickSight
effective approach for prompt and efficient data analysis
AWS CloudWatch
facilitating timely notifications about issues.
Leverages serverless computing
architectural setup efficiently handles real-time, scalable translations for the online news portal.
Amazon Redshift Spectrum
combines the power of Redshift's query processing with the ability to query directly over the data stored in S3, managed under Lake Formation's security and compliance policies
Amazon CloudWatch
offering extensive metrics and logs that can help in identifying and addressing deployment issues, performance bottlenecks, and ensuring that the application meets its performance expectations
Amazon SageMaker endpoint variants
model management and updates
ensuring the model adapts in real-time to new user behaviors and seasonal trends is crucial for maintaining relevance and user satisfaction
a lightweight model like logistic regression helps in keeping the computational costs down, meeting the requirements for both real-time adaptation and cost efficiency.
Amazon Redshift, AWS Batch, and Amazon Forecast
Used to perform retail inventory tasks.
DynamoDB
Offers quick read and write capabilities.