AWS ML Exam

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/95

Earn XP

Description and Tags

ML exam

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

96 Terms

New cards

A healthcare organization is developing an ML model to predict whether patients are at risk of developing a rare but critical disease. Missing a patient with a disease (a false negative) could delay treatment, potentially leading to severe consequences. However, incorrectly predicting a disease (a false positive) may result in unnecessary follow-up tests, which are less costly compared to missing an actual case. Which model characteristic should the healthcare organization prioritize to best meet its requirements?

Prioritize a model with high recall.

New cards

A technology company manages an application that performs data enrichment by integrating with various external APIs. To enhance security, the company must implement a solution to automatically rotate the API tokens used by the application every 90 days. Which approach should the company use to meet this requirement?

Store the API tokens in AWS Secrets Manager. Configure an AWS Lambda function to perform the token rotation.

New cards

You are a machine learning engineer working for a telecommunications company that needs to develop a predictive maintenance model. The goal is to predict when network equipment is likely to fail based on historical sensor data. The data includes features such as temperature, pressure, usage, and error rates recorded over time. The company wants to avoid unplanned downtime and optimize maintenance schedules by predicting failures just in time. Given the nature of the data and the business objective, which Amazon SageMaker built-in algorithm is the MOST SUITABLE for this use case?

DeepAR Algorithm to forecast future equipment failures based on historical data.

New cards

You are a machine learning engineer at a logistics company responsible for deploying models as containerized applications. The company has decided to use Amazon Elastic Container Registry (ECR) for storing and managing container images. They need to ensure that the images are secure, version-controlled, and easily accessible by their Amazon SageMaker and Amazon ECS services. Which of the following best practices should you implement to meet these requirements?

Enable image scanning on push to identify vulnerabilities and set up lifecycle policies to manage the lifecycle of images in the repository.

New cards

Machine Learning Engineer managing a pipeline for training and deploying ML models on AWS. The company needs to ensure that all actions taken on Amazon SageMaker resources, including model training, endpoint creation, and deletion, are logged for compliance and audit purposes. Additionally, the solution must enable tracking of any unauthorized or accidental changes to these resources. Which solution should you implement to meet these requirements?

Use AWS CloudTrail to log all API actions related to SageMaker resources and enable insights to detect unusual activity

New cards

You are a Machine Learning Engineer deploying a containerized ML model for batch inference on a large dataset. The batch inference jobs need to run on a fully managed container orchestration service with the flexibility to control resource allocation and scaling. The solution must also integrate seamlessly with other AWS services, such as S3 for input and output data storage.

Use Amazon ECS to deploy the containerized model and manage batch inference jobs, with custom resource allocation and integration with S3 for data storage.

New cards

An organization is working on a machine learning pipeline that requires preprocessing and transforming large-scale, distributed datasets for training. The data is stored in Amazon S3 and comes from various sources, including IoT devices and application logs. The organization needs a scalable solution that integrates seamlessly with the pipeline and supports distributed data processing frameworks.

Use Amazon EMR with Apache Spark in Amazon SageMaker Studio to preprocess and transform the data at scale.

New cards

You are a machine learning engineer at an e-commerce company that uses a recommendation model to suggest products to customers. After months in production, the model's performance has degraded, prompting concerns about data drift or model drift. Which of the following statements BEST describes the difference between data drift and model drift, and the appropriate tools to address them using Amazon SageMaker?

Data drift occurs when the distribution of the input data changes over time, while model drift happens when the model's underlying assumptions or parameters become outdated. To address data drift, you should use SageMaker Model Monitor to track changes in input data distribution. For model drift, you should periodically retrain the model using the latest data.

New cards

You are a data scientist working on a regression model to predict housing prices in a large metropolitan area. The dataset contains many features, including location, square footage, number of bedrooms, and amenities. After initial testing, you notice that some features have very high variance, leading to overfitting. To address this, you are considering applying regularization to your model. You need to choose between L1 (Lasso) and L2 (Ridge) regularization. Given the goal of reducing overfitting while also simplifying the model by eliminating less important features, which regularization method should you choose and why?

L1 regularization, because it can shrink some feature coefficients to zero, effectively performing feature selection.

New cards

A logistics company needs to deploy a custom ML model for daily demand forecasting. The workload is predictable and occurs within a 90-minute window each day. During this period, multiple concurrent invocations are expected, requiring low-latency responses. The company prefers minimal involvement in infrastructure maintenance or configuration. The company wants AWS to manage the underlying infrastructure and auto scaling functionality. Which is a cost-effective solution to meet these requirements?

Deploy the model using Amazon SageMaker Serverless Inference with provisioned concurrency.

New cards

A healthcare company needs to fine-tune a large language model (LLM) for a medical document summarization application. The dataset includes anonymized patient records stored in Amazon S3. The company seeks a low-code/no-code (LCNC) solution to simplify the fine-tuning process while ensuring scalability and fast deployment using Amazon SageMaker capabilities. Which services should be utilized to implement this solution?

Amazon SageMaker AutoPilot to automate the process of building and tuning ML models with limited manual intervention. AND

Amazon SageMaker Canvas to build ML models with a no-code interface, enabling non-technical users to participate in the process. AND

Amazon SageMaker JumpStart to access pre-built models and fine-tuning templates that simplify the process for LLMs.

New cards

A company operates an Amazon SageMaker training job in a public subnet within an Amazon VPC. The network is properly configured, allowing seamless data transfer between the SageMaker training job and Amazon S3. Recently, the company identified malicious traffic originating from a specific IP address, targeting the resources within the VPC. The company needs to block all traffic from the suspicious IP address while ensuring legitimate traffic remains unaffected. What solution would you recommend to address this requirement?

Create a network ACL (NACL) for the subnet hosting the SageMaker training job and add a deny rule to block traffic from the specific IP address.

New cards

What is the bias versus variance trade-off in machine learning?

The bias versus variance trade-off refers to the challenge of balancing the error due to the models complexity (variance) and the error due to incorrect assumptions in the model (bias), where high bias can cause underfitting and high variance can cause overfitting.

New cards

A company uses Amazon SageMaker Studio to develop its ML models. A team of ten developers are working on a proof-of-concept model with all the users linked to a single SageMaker Studio domain. The company needs an automated alert system that notifies them when the SageMaker compute costs exceed a specific threshold.

Add tags to the user profile in the SageMaker domain. Configure AWS Budgets to send an alert when the threshold is breached.

New cards

You are a Cloud Financial Manager at a technology company that uses a variety of AWS services to run its applications and machine learning workloads. Your management team has asked you to optimize AWS spending while ensuring that critical applications remain highly available and performant. To achieve this, you need to use AWS cost management and optimization tools to monitor spending, identify cost-saving opportunities, and optimize resource utilization across the organization. Which of the following actions can you perform using the appropriate AWS cost management tools to achieve your goal of optimizing costs and resource utilization

Use AWS Cost Explorer to analyze historical spending patterns, identify cost trends, and forecast future costs to help with budgeting and planning. AND

Leverage AWS Trusted Advisor to receive recommendations for cost optimization, such as identifying underutilized or idle resources, and reserved instance purchasing opportunities

New cards

A company's machine learning engineer has deployed a fraud detection model to an Amazon SageMaker endpoint. To meet regulatory requirements and provide transparency to stakeholders, the engineer must generate explanations for the model's predictions and understand the key factors influencing its decision-making process.

Use SageMaker Clarify to analyze the endpoint's predictions and provide feature attributions for each input.

New cards

How would you differentiate between K-Means and K-Nearest Neighbors (KNN) algorithms in machine learning?

K-Means Is an unsupervised learning algorithm used for clustering data points into groups, while KNN is a supervised learning algorithm used for classifying data points based on their proximity to labeled examples.

New cards

You are a data scientist working for a healthcare company building a machine learning model to predict patient readmissions. The dataset contains demographic, medical history, and treatment details. Your team wants to ensure that the model is free from biases against certain demographic groups, such as age or gender. To address this, you need a solution that identifies potential bias in both the training data and the model predictions

Use Amazon SageMaker Clarify to analyze the training dataset for bias and compute bias metrics on the model's predictions.

New cards

A retail company uses a machine learning model to predict customer demand. Cleaned and prepared training data is supplied every 3-4 days and uploaded to an Amazon S3 bucket. The company has an Amazon SageMaker pipeline for retraining the model, which is currently triggered manually. How can the ML engineer automate triggering the pipeline with the LEAST operational effort? (Select two)

Use Amazon S3 Event Notifications to configure a notification for S3 upload events and trigger an AWS Lambda function, which in turn invokes the SageMaker pipeline. AND

Enable Amazon EventBridge for the S3 bucket and create an EventBridge rule with an event pattern that matches the S3 upload event. Set the SageMaker pipeline as the target for the rule.

New cards

You are an ML Engineer at a financial services company tasked with deploying a machine learning model for real-time fraud detection. The model needs low-latency inference in production while requiring a cost-effective test environment for experimentation and validation. What strategies should you use to provision compute resources for production and testing environments using Amazon SageMaker?

Provision CPU-based instances in the test environment to reduce costs during experimentation and validation. AND Leverage AWS Inferentia accelerators in the production environment to meet high throughput and low latency requirements.

New cards

A retail company uses an Amazon S3 data lake to store customer purchase histories and product metadata. Data scientists frequently query the data for training machine learning models and performing analytics using Amazon Athena. To ensure compliance with internal data governance policies, the company needs to enforce fine-grained access control, ensuring that data scientists can only access data relevant to their assigned product categories, such as electronics or clothing. The solution must minimize operational overhead and integrate seamlessly with the company’s existing AWS infrastructure. Which solution will meet these requirements?

Use AWS Lake Formation to assign tags to datasets and define Lake Formation permissions based on those tags. Assign analysts specific tags for their roles.

New cards

You are a Machine Learning Engineer at a healthcare company that uses machine learning models for patient risk prediction, treatment recommendation, and anomaly detection. The company is expanding, and you are tasked with building a maintainable, scalable, and cost-effective ML infrastructure to handle increasing data volumes and evolving requirements. Which of the following strategies should you implement using AWS services? (Select three)

Utilize infrastructure as code (laC) with AWS CloudFormation to automate the deployment and management of ML resources, making it easy to replicate and scale infrastructure across regions.

AND Store all model artifacts and data in Amazon S3, and use versioning to manage changes over time, ensuring that models can be easily rolled back if needed.

AND Implement a microservices-based architecture with Amazon SageMaker endpoints, where each model is deployed independently, allowing for isolated scaling and updates.

New cards

A healthcare company is running containerized ML applications to process patient data and generate predictive insights. These applications are deployed across Amazon EC2 instances, Amazon Elastic Container Service (Amazon ECS) cluster and AWS Lambda functions. The EC2 and ECS workloads store predictions and artifacts on Amazon Elastic Block Store (Amazon EBS) volumes. To optimize costs, the company wants to identify inefficiently used resources and receive actionable recommendations to reduce expenses. What is the most efficient solution to achieve this goal with minimal development effort?

Use AWS Compute Optimizer to analyze the specifications and utilization metrics of your AWS compute resources.

New cards

A financial institution has developed and deployed a machine learning model to predict credit scores. The institution is required to comply with regulatory guidelines for transparency, fairness, and security in its ML workflows. The team needs a solution that will enable them to track and audit the model's decisions, ensure model explainability, and monitor compliance

Amazon SageMaker Clarify

New cards

You are a machine learning engineer at a fintech company tasked with creating a system that can analyze text-based customer feedback, extract sentiment, and identify key topics for actionable insights. The system must be scalable to handle real-time analysis of millions of feedback entries and should integrate easily with other AWS services for further data processing. Which AWS service should you choose to meet these requirements?

Amazon Comprehend

New cards

As a Machine Learning Engineer, you are assessing an e-commerce company's product recommendation system hosted on AWS. Your task is to ensure that the machine learning models deployed provide accurate and timely product recommendations, enhancing user experience and boosting sales. What measures should you implement to optimize the performance and reliability of these models?

Monitor model performance using Amazon CloudWatch, implement A/B testing using AWS Lambda for deployment variants, and utilize Amazon RDS to manage user data efficiently.

New cards

As a Machine Learning Engineer, you are developing a predictive maintenance model for industrial equipment. The model should predict failures before they occur, aiming to minimize downtime. Your model needs to perform real-time inference on the edge devices installed in the equipment itself. However, the computational resources on these devices are limited. Which of the following approaches would be most suitable for deploying your model efficiently on these edge devices?

Simplify the model architecture and apply model compression techniques such as pruning and quantization.

New cards

You are designing a machine learning solution to process and analyze scanned financial documents. You plan to use Amazon Textract to extract text and data from these documents. Additionally, your requirement includes triggering an AWS Lambda function to perform custom data transformation and load the processed data into Amazon DynamoDB for further querying and analysis. Which combination of AWS services should you use to build an efficient and automated workflow?

Amazon Textract, AWS Lambda, Amazon DynamoDB, AWS Step Functions.

New cards

You are a machine learning engineer tasked with optimizing a resource-intensive image processing model on AWS. The model currently runs on a single, high-memory EC2 instance which results in high costs and does not scale well under varying workloads. What would be the most cost-effective and scalable way to handle variable demand for this machine learning model?

Shift to using AWS Lambda for model execution, allowing it to scale automatically with the number of requests and minimize costs by charging only for the compute time used.

New cards

As a machine learning engineer working on a natural language processing project, you are tasked with deploying a real-time inference solution that scales automatically according to the request load. Which AWS service would be the best choice for deploying and managing such a solution?

Amazon SageMaker Endpoints

New cards

A manufacturing company wants to use Amazon Lookout for Equipment to detect anomalies in their factory's heavy machinery. They have historical sensor data stored in Amazon S3 and want to ensure that their machine learning model is continuously updated with new sensor data as it becomes available. Which AWS service combination would be most effective for implementing a real-time data processing and model updating pipeline in this scenario?

Use AWS Step Functions to orchestrate data ingestion from Amazon S3 and call Amazon Lookout for Equipment for model re-training.

New cards

A global company requires a system to translate large volumes of customer reviews from multiple languages into English. These reviews are stored in Amazon S3 and need to be processed daily. Which AWS service combination should be used to automatize this translation process efficiently while ensuring scalability and minimal handling?

Use AWS Lambda to trigger on S3 events, perform translations using Amazon Translate, and store the results back into S3.

New cards

You are a machine learning engineer tasked with improving the prediction accuracy of a model that forecasts electricity demand for a utility company. To ensure more robust predictions, you decide to incorporate weather data into your model. Using AWS services, which approach best integrates weather data for enhanced predictive performance?

Deploy the weather data ingestion pipeline using AWS Glue, and use the processed data in Amazon SageMaker to train your model.

New cards

As a data scientist working on a smart healthcare application, you need to create a model that can predict patient readmission based on various features such as age, medical history, and hospital stay information. Due to the secure nature of medical data, you are required to follow strict privacy regulations. Which AWS service is the BEST to use to develop and train your model while ensuring data privacy and compliance?

Employ Amazon SageMaker with data privacy controls for secure model training.

New cards

An AWS Machine Learning Engineer is tasked with utilizing Amazon Comprehend Medical to extract medical information from unstructured clinical text stored in Amazon S3. The engineer plans to process this data daily and feed extracted entities into a DynamoDB table. Given the need to automate this workflow, which set of services should be used to efficiently manage the data extraction and storage?

AWS Lambda, Amazon S3 event notifications, and Amazon Comprehend Medical.

New cards

A financial company is using Amazon SageMaker to develop a fraud detection model. They are concerned about the security and privacy of their data when training and deployment processes take place. What combination of AWS services should they implement to ensure enhanced security for their ML solution?

Use Amazon SageMaker with AWS KMS for data encryption and Amazon Macie for data classification and protection.

New cards

A healthcare organization is leveraging Amazon SageMaker to develop an ML model that predicts patient readmissions. They require a secure, scalable, and efficient system for data ingestion, processing, and real-time predictions. Which combination of AWS services should be implemented to optimize and secure this ML workflow?

Amazon SageMaker, Amazon Cognito, Amazon API Gateway

New cards

As a Machine Learning Engineer, you are tasked with setting up a log analytics solution using Amazon OpenSearch Service to monitor and analyze machine learning model performance logs. You decide to use AWS Lambda for preprocessing logs and Amazon S3 for storing raw log data before indexing them in OpenSearch Service. Which setup would optimize the integration and flow of logs from Amazon S3 to OpenSearch Service?

Use Amazon Kinesis Firehose to stream logs from S3 to AWS Lambda for processing, and then to OpenSearch for indexing.

New cards

In a retail environment, an AWS Certified Machine Learning Engineer needs to monitor financial metrics effectively with Amazon Lookout for Metrics (LFM) and automate the process of investigating anomalies using AWS Lambda and Amazon SNS. Which setup would correctly configure these services to enhance business efficiency?

Configure LFM to send anomalies directly to Amazon SNS, which then triggers an AWS Lambda function subscribed to the SNS topic to investigate anomalies.

New cards

A data science team is using Amazon SageMaker to train a model on time-series data that is ingested daily into Amazon S3. They need to preprocess this data each day before training, ensuring the data is partitioned by date and then automatically initiate the training process. Which combination of AWS services would optimize this workflow?

Amazon EventBridge and AWS Step Functions

New cards

A team working on a sentiment analysis project requires human input to validate the sentiment tagged by their machine learning model onto large datasets. They decided to use Amazon Mechanical Turk (MTurk) to handle this task. Alongside MTurk, they plan to deploy an AWS service that could automate the task allocation, result collection, and reassignment of uncompleted or wrongly done tasks to other workers. Which AWS service should be used in conjunction with MTurk to effectively manage this process?

AWS Step Functions

New cards

You are using Amazon Bedrock to orchestrate machine learning workflows, which includes processing large datasets stored in Amazon S3 and subsequently training models. Considering performance optimization and cost reduction, which AWS service would most effectively handle the seamless transition from data preprocessing to model training?

AWS Batch

New cards

A company utilizing AWS wants to create an interactive voice response system for their customer service, leveraging Amazon Polly for speech synthesis. They need the system to dynamically generate and store logs of customer interactions for compliance and analysis purposes. Which service should be integrated with Amazon Polly to efficiently manage and analyze these logs?

Amazon DynamoDB.

New cards

As a machine learning engineer at a retail company, you are tasked with optimizing an ML system that predicts inventory needs based on seasonal trends. The system must handle large datasets efficiently and update predictions frequently to respond to real-time sales data. What is the most appropriate AWS service combination and strategy to achieve high performance and scalability for this ML workload?

Utilize Amazon Redshift for data warehousing, AWS Batch for managing batch processing jobs, and Amazon Forecast for demand forecasting based on machine learning

New cards

As an ML engineer, you are working on enhancing a recommendation system for an e-commerce platform. The system needs to dynamically adapt to user preferences and seasonal trends while ensuring cost efficiency in computation and quick adaptation to new data. What approach would best optimize for real-time learning and cost-efficiency in this scenario?

Deploy a lightweight machine learning model that updates incrementally, such as online learning with logistic regression.

New cards

A machine learning team at a media analysis company uses Amazon Mechanical Turk to annotate images for a computer vision model. They need to efficiently process large image datasets, requiring temporary storage and automated pre-processing before annotation. Which combination of AWS services should they use along with Amazon Mechanical Turk to streamline this workflow?

Amazon S3 and AWS Lambda.

New cards

In a scenario where you are using Amazon Bedrock to train machine learning models on datasets stored within Amazon S3, and the trained model artifacts need to be automatically deployed and made accessible through AWS Lambda functions, which service would optimally handle the deployment automation and management?

AWS CodeDeploy

New cards

You are implementing a machine learning model for predicting customer churn and have deployed it into production using AWS services. To ensure that the model continues to perform as expected while adapting to changes in data, you need a reliable method for updating the model. Which of the following options would you use to efficiently manage model updates?

Use Amazon SageMaker endpoint variants to gradually shift traffic to new model versions for performance comparison before full deployment.

New cards

A data science team is deploying a recommender system on Amazon ECS and using Amazon CodeGuru Reviewer to optimize the code performance. Which AWS service should they use in conjunction with CodeGuru Reviewer and ECS to automate the application's deployment with monitoring and analysis of deployment-based metrics?

Amazon CloudWatch

New cards

You are tasked with setting up a data lake using AWS Lake Formation, followed by running complex SQL queries to analyze this data for future machine learning projects. What AWS service combination would be most effective for running these SQL queries on the data stored within Lake Formation?

Amazon Redshift Spectrum

New cards

An online news portal wants to enhance user engagement by providing real-time article translations in multiple languages. The news articles are stored in Amazon S3 and need translation on-demand as users request them in different languages. Which architecture would best serve this purpose using Amazon Translate integrated with other AWS services?

Integrate Amazon Translate with Amazon API Gateway and AWS Lambda, storing the translated articles in Amazon DynamoDB and using cache control with Amazon CloudFront

New cards

You are developing a healthcare application that must securely process and analyze sensitive handwritten medical forms using Machine Learning. You decide to use Amazon Textract for text extraction, followed by further analysis and storage. Considering the sensitive nature of data, which AWS services should be integrated into the solution to ensure security and compliance?

AWS Lambda, AWS KMS, Amazon S3.

New cards

You are deploying a document search application using Amazon Kendra for a legal firm. To manage and automate the deployment and updates of your Kendra indexes, you decide to integrate AWS CloudFormation. Additionally, you want to ensure the indexing operation triggers a notification when certain error conditions are met. Which AWS service should you integrate for this purpose?

Amazon CloudWatch

New cards

As an AWS Machine Learning Engineer, you are tasked to deploy a fraud detection model for a financial institution using Amazon SageMaker. The model needs to quickly adapt to emerging fraud patterns, and it is vital to roll out updates safely without causing downtime or degradation in performance. What deployment strategy should you employ to ensure both high availability and a reliable update mechanism?

Use A/B testing to gradually introduce the new version by splitting traffic between the old and new model version, monitoring performance before fully transitioning.

New cards

A machine learning team is using AWS Glue DataBrew to preprocess and transform datasets for predictive analysis. The datasets are stored in Amazon S3 and must be joined with transactional data updated frequently in Amazon RDS. Completed datasets should be directly analyzed using Amazon QuickSight for business insights. Which approach best handles this data flow while maintaining efficiency and automation

Use AWS Lambda to trigger DataBrew jobs on data update events in S3, join with RDS using DataBrew, and automatically export processed datasets to Amazon QuickSight SPICE for analysis.

New cards

In a scenario where a machine learning model is deployed on Amazon EC2 instances to analyze financial transactions for fraudulent activities in real-time, which combination of services would be the MOST effective for dynamically adjusting the compute capacity based on the fluctuating load of incoming transactions?

Amazon EC2 with Auto Scaling and Amazon SNS.

New cards

You are a DevOps engineer at a technology company that uses Amazon SageMaker to develop and deploy machine learning models. The company wants to streamline the deployment process by containerizing the models and storing the container images securely. They decide to use Amazon Elastic Container Registry (ECR) for this purpose. Which of the following steps should you take to securely store and manage the container images in Amazon ECR?

Create a private repository in Amazon ECR and configure repository policies to control access. Enable image scanning to detect vulnerabilities.

New cards

You are a Machine Learning Engineer at a media production company that wants to use Amazon Transcribe to convert audio recordings of interviews into text for analysis. The company needs to ensure high accuracy in the transcriptions and manage a large volume of audio files efficiently. Which combination of Amazon Transcribe features should you use to achieve these goals

Use Amazon Transcribe with custom vocabularies to improve the accuracy of transcriptions for industry-specific terms and set up batch transcription jobs to process large volumes of audio files efficiently.

New cards

You are working as a data scientist at a company that specializes in predictive analytics. You are tasked with training a deep learning model using Amazon SageMaker to predict customer churn. The dataset you have is large and contains millions of records. The training process is taking longer than expected, and you suspect that the hyperparameters need fine-tuning. You want to balance the training time while ensuring the model converges effectively. You have set the batch size to 256, epochs to 50, and learning rate to 0.01. However, the training job is still not performing as expected. Given this scenario, which of the following adjustments is MOST LIKELY to reduce the training time without compromising model performance?

Increase the batch size and decrease the number of epochs.

New cards

You are a Machine Learning Engineer at a social media company that needs to process and analyze real-time data streams from user interactions to provide personalized content recommendations. The company uses Amazon Kinesis to handle the data streams. Which combination of Amazon Kinesis features should you use to build an efficient and scalable real-time data processing pipeline for this use case?

Use Amazon Kinesis Data Streams to ingest real-time data, Amazon Kinesis Data Firehose to load the data into Amazon S3, and AWS Lambda to process the data and generate recommendations.

New cards

Use A/B testing to gradually introduce the new version by splitting traffic between the old and new model version, monitoring performance before fully transitioning.

New cards

Use AWS Lambda to trigger DataBrew jobs on data update events in S3, join with RDS using DataBrew, and automatically export processed datasets to Amazon QuickSight SPICE for analysis.

New cards

Amazon EC2 with Auto Scaling and Amazon SNS

New cards

Create a private repository in Amazon ECR and configure repository policies to control access. Enable image scanning to detect vulnerabilities

New cards

Increase the batch size and decrease the number of epochs.

New cards

Use Amazon Kinesis Data Streams to ingest real-time data, Amazon Kinesis Data Firehose to load the data into Amazon S3, and AWS Lambda to process the data and generate recommendations.

New cards

A pharmaceutical research team stores encrypted molecular simulation data in an Amazon S3 bucket with server-side encryption using AWS KMS keys (SSE-KMS). A data analyst needs to access this data using an Amazon SageMaker notebook instance. The solution must ensure that the notebook instance can access and decrypt the data securely while adhering to AWS best practices. Which options can meet these requirements independently?

Attach an IAM role to the SageMaker notebook instance with s3:GetObject permissions for the S3 bucket and kms:Decrypt permissions for the KMS key.; Grant the SageMaker notebook instance's IAM role s3:GetObject permissions for the S3 bucket and add the role's ARN to the KMS key policy for kms:Decrypt permissions.

New cards

You are a data scientist at an e-commerce company working to develop a recommendation system for customers. After building several models, including collaborative filtering, content-based filtering, and a deep learning model, you find that each model excels in different scenarios. For example, the collaborative filtering model works well for returning customers with rich interaction data, while the content-based filtering model performs better for new customers with little interaction history. Your goal is to combine these models to create a recommendation system that provides more accurate and personalized recommendations across all customer segments. Which of the following strategies is the MOST LIKELY to achieve this goal?

Implement a hybrid model that combines the predictions of collaborative filtering, content-based filtering, and deep learning using a weighted average, where weights are based on model performance for different customer segments.

New cards

You are a Machine Learning Engineer at an e-commerce company that wants to automate the build process for machine learning model artifacts using AWS CodeBuild. The company needs to ensure that the build process is efficient, reliable, and integrates well with other AWS services. Which combination of AWS CodeBuild features and best practices should you implement to achieve these goals?

Use AWS CodeBuild with a buildspec file to define the build commands and settings, and integrate with Amazon S3 to store the build artifacts.

New cards

Which of the following highlights the differences between model parameters and hyperparameters in the context of machine learning?

Model parameters are learned during training and define the model's behavior on input data. Hyperparameters are set before training and control aspects of the learning process, such as learning rate and batch size

New cards

Your company has a significant amount of data stored in an Amazon DynamoDB database. As part of a new machine learning project, you need to access this data from an Amazon SageMaker notebook for analysis and model development. The goal is to ensure that the data is efficiently accessible from the notebook while maintaining performance and minimizing potential delays. Which approach is the MOST EFFECTIVE for accessing the data?

Use the AWS SDK for Python (Boto3) to query DynamoDB directly from the SageMaker notebook.

New cards

A retail company uses a customer service chatbot powered by a large language model (LLM) through Amazon Bedrock to handle product inquiries and returns. Customers have reported that the chatbot gives slightly different answers when asked the same questions about return policies or product details. The company needs to ensure the chatbot provides consistent responses. How should they achieve this?

Adjust the temperature parameter in the Amazon Bedrock API to a lower value and decrease the top-K parameter to limit the number of possible tokens the model can choose from, ensuring consistent and controlled responses.; Retrain the LLM with a retail-specific dataset to improve consistency in responses related to product information and return policies.

New cards

You are a data scientist at a financial services company tasked with deploying a lightweight machine learning model that predicts creditworthiness based on a customer's transaction history. The model needs to provide real-time predictions with minimal latency, and the traffic pattern is unpredictable, with occasional spikes during business hours. The company is cost-conscious and prefers a serverless architecture to minimize infrastructure management overhead. Which approach is the MOST SUITABLE for deploying this solution, and why?

Deploy the model directly within AWS Lambda as a function, and expose it through an API Gateway endpoint, allowing the function to scale automatically with traffic and provide real-time predictions.

New cards

You are a data engineer responsible for monitoring the performance of a suite of machine learning models deployed across multiple environments at an e-commerce company. The models are used for various tasks, including recommendation engines, demand forecasting, and customer segmentation. To ensure that these models are performing optimally, you need to set up a centralized dashboard that allows stakeholders to monitor key performance metrics such as latency, accuracy, throughput, and resource utilization. The dashboard should be user-friendly, provide insights at a glance, and support both technical and non-technical users. Which approach is MOST SUITABLE for setting up a dashboard to monitor the performance metrics of these ML models?

Use Amazon QuickSight to create a visual dashboard that integrates data from Amazon CloudWatch Logs via Amazon S3, providing interactive charts and graphs that allow stakeholders to drill down into specific metrics as needed.

New cards

You are a Machine Learning Engineer at a financial services company. The company has deployed several machine learning models using Amazon SageMaker, and it is critical to monitor the performance and health of these models to ensure they are operating correctly. Which combination of Amazon CloudWatch features should you use to effectively monitor the performance and health of your SageMaker models?

Use CloudWatch Logs to collect and store logs from SageMaker endpoints and create CloudWatch Dashboards to visualize key metrics

New cards

You are a data scientist at a healthcare company tasked with deploying a machine learning model that predicts patient outcomes based on real-time data from wearable devices. The model must be containerized for deployment and scaling across different environments, including development, testing, and production. The company wants to ensure efficient, secure, and consistent management of container images across all environments. Which combination of AWS services is the MOST SUITABLE for building, storing, deploying, and maintaining this containerized ML solution?

Use Amazon ECR to store the container images, Amazon EKS for orchestrating the containers, and AWS CodePipeline for automating the CI/CD pipeline, ensuring that updates to the model are seamlessly deployed.

New cards

You are a Machine Learning Engineer at a media company that wants to enhance its accessibility features by converting online articles into lifelike speech using Amazon Polly. The company needs to customize the pronunciation of specific words and phrases to ensure they are spoken correctly and naturally. Additionally, the solution should handle a high volume of articles with minimal latency. Which combination of Amazon Polly features and configurations should you use?

Use Amazon Polly Neural TTS voices and create a custom lexicon to define the pronunciation of specific words and phrases

New cards

You are an MLOps Engineer responsible for automating the CI/CD pipeline for machine learning models deployed on Amazon SageMaker. Your organization has a requirement to automate the entire workflow, from data preprocessing to model deployment, while ensuring traceability and rapid iteration of changes. AWS CodePipeline has been chosen to orchestrate the CI/CD workflow. How can you ensure a seamless and efficient implementation of the pipeline for your ML models?

Integrate AWS CodePipeline with AWS CodeBuild to preprocess data and train models, and use SageMaker Model Registry for tracking model versions before deploying.

New cards

You are training a machine learning model using Amazon SageMaker. Your training job needs to handle imbalanced data, where one class significantly outnumbers the others.

Which built-in SageMaker algorithm should you consider to address this issue?

Linear Learner

New cards

You are training a convolutional neural network (CNN) model for image classification using PyTorch. The dataset is stored in Amazon S3, and your team wants to use Amazon SageMaker to manage the training process. You decide to use SageMaker script mode to customize the training job. What is the MOST EFFECTIVE way to achieve this?

Write a training script compatible with PyTorch, upload it to Amazon S3, and use SageMaker's built-in PyTorch container with script mode to execute the training job.

New cards

You are an ML engineer at a startup that is building a recommendation engine for an e-commerce platform. Training jobs on large datasets are sporadic but compute-intensive, while the inference endpoint must handle variable traffic throughout the day. The company is cost-conscious and requires a solution that balances cost efficiency, scalability, and performance.

Which resource allocation approach is the MOST SUITABLE for training and inference, and why?

Use on-demand instances for training, allowing flexibility to scale resources as needed, and use provisioned inference instances with auto-scaling to handle varying traffic while controlling costs.

New cards

You are a data engineer at a healthcare company responsible for ensuring that sensitive patient data stored in Amazon S3 is securely encrypted. The company requires a solution that provides fine-grained access control, automated key rotation, and detailed logging of key usage for auditing purposes.

Which of the following approaches using AWS Key Management Service (KMS) is the MOST SUITABLE for meeting these requirements?

Use AWS KMS to create customer managed keys (CMKs) and enable automatic key rotation. Configure IAM policies to control access to the keys and enable AWS CloudTrail to log all key usage.

New cards

You are a data scientist working for a customer service company. The company wants to analyze customer feedback stored as plain text in Amazon S3 to extract key phrases, sentiment, and entities for a customer-satisfaction dashboard. The analysis must scale and minimize development overhead. Which approach is MOST suitable for implementing this solution by using Amazon Comprehend?

Use Amazon Comprehend asynchronous batch processing to analyze the feedback text directly from the Amazon S3 bucket.

New cards

You are a data scientist working on a deep-learning model to classify medical images for disease detection. The model initially shows high accuracy on the training data but performs poorly on the validation set, indicating signs of overfitting. The dataset is limited in size, and the model is complex, with many parameters. To improve generalization and reduce overfitting, you need to implement appropriate techniques while balancing model complexity and performance.

Which combination of techniques is MOST LIKELY to help prevent overfitting and improve the model’s performance on unseen data?

Combine data augmentation to increase the diversity of the training data with early stopping to prevent overfitting, and use ensembling to average predictions from multiple models.

New cards

You are a data scientist working for a healthcare company that develops predictive models for diagnosing diseases based on patient data. Due to regulatory requirements and the critical nature of healthcare decisions, model interpretability is a top priority. The company needs to ensure that the predictions made by the model can be explained to both medical professionals and regulatory bodies. You are evaluating different algorithms in Amazon SageMaker for your model, balancing the trade-off between accuracy and interpretability. The initial trials show that more complex models like deep neural networks (DNNs) yield higher accuracy but are less interpretable, whereas simpler models like logistic regression provide clearer insights but may not perform as well on the dataset. Given these considerations, which of the following approaches is MOST APPROPRIATE for achieving both interpretability and acceptable performance?

Use a tree-based algorithm like XGBoost, which offers a balance between accuracy and interpretability with feature importance.

New cards

You are a machine learning engineer at a fintech company tasked with developing and deploying an end-to-end machine-learning workflow for fraud detection. The workflow involves multiple steps, including data extraction, preprocessing, feature engineering, model training, hyperparameter tuning, and deployment. The company requires the solution to be scalable, support complex dependencies between tasks, and provide robust monitoring and versioning capabilities. Additionally, the workflow needs to integrate seamlessly with existing AWS services. Which deployment orchestrator is the MOST SUITABLE for managing and automating your ML workflow?

Use Amazon SageMaker Pipelines to orchestrate the entire ML workflow, leveraging its built-in integration with SageMaker features like training, tuning, and deployment.

New cards

A company wants to use ML to categorize products that are sold in stores. An ML engineer is building an image classification model by using convolutional neural networks (CNN) on Amazon SageMaker. The company will use the model to classify images of laptops and mobile phones. During model training, the precision for the mobile phones image class was lower than expected. The ML engineer wants to understand why some images of mobile phones were misclassified. Additionally, the ML engineer wants to improve model performance for the mobile phones class.

Which solution will meet these requirements?

SageMaker with TensorBoard to analyze intermediate tensors

New cards

A financial services company has an ML model for near real-time, low-latency fraud detection. The model is less than 1 GB in size, receives no more than 10 concurrent requests, and is currently hosted on premises. The company wants to host the model on AWS but does not want to manage or optimize the infrastructure that is servicing the models. The company wants a hosting solution with minimal overhead.

Which solution will deploy the ML model with the LEAST effort?

Configure the model as an Amazon SageMaker Model. Serve the model on a serverless SageMaker endpoint

New cards

A data scientist is working with an ML model and needs to use new data to update the model. The model was previously tuned by Amazon SageMaker automatic model tuning (AMT). The data scientist wants to use previous tuning job results for more efficient training and to save compute time if the training accuracy is not improving.

Which hyperparameter tuning job will meet these requirements?

New cards

Identify the difference in the predicted outcome as an input feature changes.

Partial dependence plots (PDPs)

New cards

Quantify the contribution of each feature in a prediction.

Shapley values

New cards

Measure the imbalance of positive outcomes between different facet values.

Difference in proportions of labels (DPL)

New cards

A company wants to implement an internal employee chatbot that can answer questions that are related to internal company topics and processes. The chatbot will use a generative AI approach and large language models (LLMs) to interact with users. The relevant information is currently held in an unstructured format in thousands of PDF documents in the company's document management system. An ML engineer must implement the chatbot solution.

The ML engineer has set up a knowledge base with the company documents for retrieval augmented generation (RAG).

What should the ML engineer do to find documents that are relevant to a user question?

Use an embeddings model to embed the question. Query the knowledge base by using the output from the embeddings model. Configure the query to return the contents of the desired number of nearest neighboring documents.

New cards

Question

The ML engineer has set up a retrieval system for augmented generation.

Which solution will meet the requirements to provide an answer to a user question based on the identified, relevant documents?

Engineer a prompt that contains the context (relevant document chunks) and the user question. Engineer the prompt to make the system answer the question based on the context. Send the prompt to an LLM on Amazon Bedrock and return the answer to the user.

New cards