ML Exam Review

0.0(0)

Studied by 0 people

Learn

Practice Test

Spaced Repetition

Match

Flashcards

Card Sorting

1/55

Earn XP

Description and Tags

Flashcards for reviewing ML exam topics

Study Analytics

Name	Mastery	Learn	Test	Matching	Spaced

No study sessions yet.

56 Terms

New cards

A company wants to read, write, and transform Apache Parquet files from an Amazon S3 bucket. An ML engineer has been asked to automate this process by creating a custom transform script in Python. Which solution meets these requirements with the LEAST operational effort?

AWS Glue

New cards

A company is building a fraudulent transaction detection solution on Amazon SageMaker. The company wants to use a SageMaker built-in algorithm for the fraud detection model by using customer transaction data. Which algorithm should the company use?

Random Cut Forest (RCF)

New cards

An agriculture company plans to use drone imagery to count their flock of sheep. The company is experimenting with Amazon SageMaker built-in algorithms, and needs advice on what type of ML model to use. Which ML model should the company use?

Object detection model

New cards

A global company wants to build an internal chatbot that employees can use to answer questions about company-relevant information. The chatbot will use a retrieval augmented generation (RAG) approach to retrieve relevant information from internal documents. The chatbot will use a large language model (LLM) to answer the employees' questions. The company expects the chatbot to consistently have a high number of queries. The chatbot must be available 24 hours a day, 7 days a week. The company wants to use a fully managed RAG solution. Which solution will meet these requirements MOST cost-effectively?

Amazon Bedrock Knowledge Bases with provisioned throughput

New cards

A retail company is using an Amazon SageMaker recommendation model to generate personalized in application notifications. The product catalog changes on a weekly basis, and the merchandise categories are highly imbalanced. The company is concerned that data drift is too high and introduces significant bias, despite regular model re-training. Which action can the company take to detect bias in production?

SageMaker Clarify

New cards

A company designed a classification system. The system uses an ML model deployed on an Amazon SageMaker endpoint. The company wants to assess system performance by implementing a feedback mechanism to track the model's performance. Which solution will meet these requirements with the LEAST development effort?

Use SageMaker Model Monitor to ingest and merge captured data from the endpoint and the processed feedback. Create and schedule a baseline job and a model quality monitoring job

New cards

A library designed a book recommendation system.

The system was deployed by using an Amazon SageMaker endpoint. The endpoint has a target tracking scaling policy to auto-scale based on the

number of invocations metric. After system deployment, traffic has seen intermittent spikes that caused over-scaling. An ML engineer must implement a solution to handle the spike in traffic. Which solution will meet these requirements with the LEAST operational overhead?

Specify a cooldown period in the target tracking scaling policy

New cards

A company is developing a TensorFlow model by using Amazon SageMaker framework estimators. The model is experiencing heavy system utilization. An ML engineer must identify system utilization bottlenecks in real time. Which solution will meet these requirements?

Use Amazon CloudWatch to monitor SageMaker instance metrics that are used by the model.

New cards

A company is using Amazon SageMaker to train and evaluate an ML model. The company will use the model to predict if an email is spam or not. Because the model will be used for internal emails, the company wants to ensure that legitimate emails are not incorrectly flagged as spam. Which model evaluation metric will meet these requirements?

Precision

New cards

A data scientist at a bank wants to train a model to predict loan approvals by using XGBoost on Amazon SageMaker. The training dataset is a tabular dataset. The dataset includes a column named 'approved' that

indicates if the loan is approved, with 1 indicating approved and 0 indicating not approved. When setting up the hyperparameter tuning, the data scientist needs to provide an evaluation metric. Which evaluation

metric is correct for this scenario?

validation: f1

New cards

An ML engineer uses Amazon SageMaker Data Wrangler to prepare a dataset for training an ML model to predict housing prices. The dataset consists of housing data from the past 10 years sorted by date and includes features such as location, size, and price. The ML engineer wants to reduce prediction bias and ensure that the model generalizes well on future, unseen data. Which SageMaker Data Wrangler split transform will meet this requirement?

Ordered split

New cards

An ML engineer uses Amazon SageMaker Data Wrangler to pre-process a housing dataset for a Support Vector Machine (SVM) regression model. The dataset includes a Property_Age feature with values ranging from 1-10. The dataset also includes a Property_Price feature with most values around 300,000 and several outliers with values up to 15,000,000. The model requires features to be on a similar scale for optimal performance. Which SageMaker Data Wrangler scaling function should be applied to the model?

Robust Scaler

New cards

An ML engineer must prepare data from car rental contracts for model training. The car rental contracts that are used to train the model are in plain text format and are stored in Amazon S3. The contracts include the renter's name, age, email address, driver's license ID, car model, and vehicle identification number. In preparation for model training, the contracts must be processed to detect personally identifiable information (PII). Which solution will meet these requirements with the LEAST operational overhead?

Use an Amazon SageMaker Canvas ready-to-use model to detect PII.

New cards

A data scientist successfully used Amazon Comprehend from an Amazon SageMaker notebook

instance through the boto3 Python APIs. Later, a security team configures a VPC endpoint dedicated to

Amazon Comprehend. Security requirements state that AWS services should not be reached through the

public internet. The data scientist attempts to update the SageMaker notebook to reach the DNS entry of the VPC endpoint, but the service call fails. How can the data scientist resolve the error to access Amazon

Comprehend from the SageMaker notebook instance?

Verify if the SageMaker notebook instance is configured to run inside the same VPC as the VPC endpoint.

New cards

A company built a deep learning model for climate modeling by using Amazon SageMaker. In each invocation, the model processes 400 MB of data for 30 minutes to return a prediction. The climate model is invoked automatically when a new climate event is

detected. The company needs a deployment strategy to move the deep learning model to production. A cold

start can be tolerated. What is the MOST cost-effective solution?

Deploy the model by using an asynchronous endpoint.

New cards

An ML engineer must maintain an existing Amazon SageMaker Pipelines pipeline to build an ML model.

The ML engineer must modify the current pipeline to implement a custom model training logic. The training

code is written in Python. Which modification should the ML engineer make to meet these requirements?

Wrap the custom training logic into a function and use the @step decorator in the function. Add the function as a step in the current pipeline.

New cards

A social media company wants to build a content moderation system to detect inappropriate or offensive material in user-uploaded images. Which solution will meet this requirement?

Use Amazon Rekognition moderation APIs.

New cards

A data scientist is training a deep learning neural network by using Amazon SageMaker. The data

scientist wants to debug the model to identify and address model convergence issues. The data scientist

wants to use real-time monitoring to determine if there is a sampling imbalance between classes. Which

solution will meet these requirements with the LEAST operational overhead

Set up a SageMaker training job that is configured to include SageMaker Debugger. Start the training job and monitor for sampling imbalance by using SageMaker Debugger built-in rules.

New cards

A company is planning to develop an ML model by using Amazon SageMaker. The training dataset is

sensitive and is stored in Amazon S3 in a different AWS Region from where the company plans to run SageMaker. The training dataset cannot be exposed to the public internet during processing. Which solution will meet these requirements?

Disable direct internet access for SageMaker instances. Enable an interface VPC endpoint within the VPC. Encrypt the S3 data by using AWS Key Management Service (AWS KMS).

New cards

An ML engineer is developing a semantic segmentation computer vision product. The ML engineer has an unlabeled image dataset that is

stored in Amazon S3. The images must be labeled to prepare the dataset to train a built-in classification ML

model on Amazon SageMaker. Which solution will meet these requirements with the LEAST operational

overhead?

Create a SageMaker Ground Truth labeling job.

New cards

A financial services company created a feature group in Amazon SageMaker Feature Store. The feature group manages user-related ML features across different areas. Data has already been loaded into

SageMaker Feature Store. An ML engineer needs to add a new feature to the feature group. The feature

group will be used in several ML marketing models. The ML engineer must update the historical records to

include the values of the new feature. Which step should the ML engineer take to add the new

Use the UpdateFeatureGroup operation to add the new feature to the feature group. Specify the name and type.

New cards

A financial services company created a feature group in Amazon SageMaker Feature Store. The feature

group manages user-related ML features across different areas. Data has already been loaded into

SageMaker Feature Store. An ML engineer needs to add a new feature to the feature group. The feature

group will be used in several ML marketing models. The ML engineer must update the historical records to

include the values of the new feature. Which step should the ML engineer take to update the histo...

Use the PutRecord operation to overwrite the records that do not have data for the new feature.

New cards

An ML engineer built an ML solution that was deployed in an AWS account. The account was shared by the

company's ML team, which is where additional projects are already running. The company needs to use AWS Cost Center to track costs across all the AWS resources that are used in the solution. These resources include training and batch inference

workflows in Amazon SageMaker Pipelines, Amazon S3 buckets, and AWS Glue tables. Which solution will

meet the requirements to group and track the project costs?

Add an inline policy to the execution role of the SageMaker Studio domain.

New cards

Which solution will meet the requirements to group and track the project costs?

Assign a user-defined tag to the project AWS resources that includes a project identifier. Activate user-defined tags in the AWS Billing and Cost Management console and use AWS Cost Explorer to filter costs by the project identifier.

New cards

A car company wants to build an ML model by using Amazon SageMaker to predict the prices of pre-owned

cars. The company provides a dataset to a data scientist that includes thousands of observations and

10 features based on past sales data. Which ML algorithm should the data scientist use to meet these

requirements?

Linear learner algorithm

New cards

A research team collects data from 10 universities that are participating in a research study. The data consists

of many large .csv files that are uploaded from each university into Amazon S3. An ML engineer notices that files are taking a long time to upload. The ML engineer needs to increase the upload speed. Which solution will meet these requirements?

Use Amazon S3 Transfer Acceleration.

New cards

A data scientist is exploring a dataset by using an Amazon SageMaker Studio notebook. The data

scientist wants to visualize the correlation between different input features. Which correlation metric should the data scientist use to investigate non-linear relationships between numeric features?

Spearman

New cards

An online retail company is using an Amazon SageMaker endpoint to deliver product recommendations to customers directly in a web application. An ML specialist needs to ensure that the ML model remains available during seasonal sale

events. The ML model must be able to accommodate the expected increase in endpoint invocations. Which

solution provides the HIGHEST scalability capabilities to meet these requirements?

Configure auto scaling on the SageMaker ML model endpoint

New cards

A telecommunications company uses an Amazon SageMaker ML model to predict customer turnover.

The model is an XGBoost tree-based model. The tabular dataset includes both nominal categorical

variables and numerical variables. A data scientist must transform the variables so that the data can be

analyzed in the SageMaker environment. Which solution should the data scientist use to help analyze

the data?

Use SageMaker Data Wrangler to perform encoding on the categorical variables.

New cards

A financial services company created a feature group in Amazon SageMaker Feature Store. The feature group manages user-related ML features across different areas. Data has already been loaded into SageMaker Feature Store. An ML engineer needs to add a new feature to the feature group. The feature group will be used in several ML marketing models.

The ML engineer must update the historical records to include the values of the new feature. Which step should the ML engineer take to add the new featu...

Use the UpdateFeatureGroup operation to add the new feature to the feature group. Specify the name and type.

New cards

How can the model engineer train the built-in Sagemaker ML model in the MOST cost- effective manner?

In the SageMaker training job, set EnableManagedSpotTraining to True.

New cards

An ML engineer wants to train a model to analyze customer turnover in a telecommunications company.

The ML engineer created a script to fit a Cox model in an Amazon SageMaker training job by using a dataset available in Amazon S3. The training code requires access to third-party Python libraries including scikitlearn, NumPy, pandas, and a proprietary library. The proprietary library code cannot be modified and is

available in a private artifact repository. Which solution will run the training job with the LEAST operational overhead?

Extend the prebuilt SageMaker scikit-learn framework container to include custom dependencies.

New cards

An ML engineer wants to create a text summarization model that is based on the Amazon SageMaker seq2seq algorithm. The ML model training data

includes 1 TB of flat files. The ML Engineer must convert the data to RecordIO-Protobuf format. Which

solution will meet these requirements?

Launch an Apache Spark Amazon EMR cluster to transform the training data to RecordIO-Protobuf format on Amazon S3.

New cards

An ML engineer must monitor a production ML model that has an endpoint that is configured for real-time inference. Model training data and inference I/O data are stored in Amazon S3. The ML engineer needs to track data drift in production to see if the quality of predictions changes from when the model was trained. Which solution should the ML engineer use to

create a baseline of the training data?

Use an Amazon SageMaker Model Monitor prebuilt container with SageMaker Python SDK to generate statistics from the training data.

New cards

A data scientist is developing a forecasting model by using Amazon SageMaker. The data scientist has 3

years of daily time series data, including days with missing data. The data is stored in Amazon S3. The

data scientist wants to perform feature engineering by filling in missing values with various substitutes. What is the MOST operationally efficient method to fill in missing values?

Use SageMaker Data Wrangler within the SageMaker Canvas environment to fill missing values.

New cards

A data scientist must train an ML model in Amazon SageMaker. The model should be trained with customer purchasing data to classify customer

segments based on behavior. The data scientist must evaluate multiple algorithms and track model performance. Which solution will meet these requirements with the LEAST effort?

Use SageMaker built-in algorithms to train the model. Use SageMaker Experiments to track model runs and results.

New cards

A company is using an ML model that runs inferences in real time in Amazon SageMaker as part of an online

application. Lately, the accuracy of the model has been decreasing. The company has developed three

new versions of the model. The company wants to perform A/B testing on the new versions of the model

and deploy the model that has the highest accuracy. Which solution will meet these requirements with the

LEAST operational overhead?

Deploy the three new versions of the model behind a single SageMaker endpoint. Define a traffic percentage for each version.

New cards

A data scientist needs to deploy an ML model. The model will be invoked every 24 hours. The model takes

30 minutes to process requests. Which solution will meet these requirements MOST cost-effectively?

Create an Amazon SageMaker batch transform job.

New cards

An ML engineer is developing a computer vision ML model to identify visual defects on the products. The

engineer is using a dataset with 50,000 images of products that will be split for training and evaluation.

During the validation step, the model is not accurately capturing the underlying relationship in the training dataset. Which approach will improve the model performance?

Increase the amount of domain-specific features in the training dataset.

New cards

A data scientist wants to train a model to predict housing prices by using XGBoost on Amazon SageMaker. The training dataset is a tabular dataset.

The dataset includes a column named 'price' that indicates the sales price of each house. The data scientist is setting up the hyperparameter tuning, and needs to provide an evaluation metric. Which evaluation metric is appropriate for this scenario?

validation: mse

New cards

Which action will help the companyâ€™s own code

and dependencies run feature engineering in Amazon

SageMaker? A company wants to implement predictive maintenance for critical equipment by using ML algorithms. The company requires development, deployment, and management of the predictive

maintenance solutions.

Build a Dockerfile and push the image to Amazon Elastic Container Registry (Amazon ECR).

New cards

An ML engineer must implement a solution that processes hundreds of thousands of text inputs once

every 24 hours. Each of the inputs is inserted into a prompt and sent to a large language model (LLM) for

inference. The LLM response must be stored in an Amazon S3 bucket. Which solution will meet these requirements with the LEAST operational overhead?

Create a batch inference job in Amazon Bedrock. Store the input file in an S3 bucket and specify the stored file as an input to a CreateModelInvocationJob request. Specify the output locations for the request as the target S3 bucket.

New cards

An ML engineer wants to create a text summarization model that is based on the Amazon SageMaker seq2seq algorithm. The ML model training data

includes 1 TB of flat files. The ML engineer must convert the data to RecordIO-Protobuf format. Which

solution will meet these requirements?

Launch an Apache Spark Amazon EMR cluster to transform the training data to RecordIO-Protobuf format on Amazon S3.

New cards

The company wants to deploy a new version of a model into production. The company wants to shift

only a small portion of traffic at first. If the results are satisfactory, the company will shift the remainder of

the traffic to the new version. Which solution will meet these requirements?

Implement a blue/green deployment strategy in canary mode.

New cards

A company wants to build an ML model to predict future sales based on historical trends. The company

has several years of sales data stored in an SQL server in an on-premises database. The company uses AWS

and has dedicated network connectivity between AWS and the on-premises database. An ML engineer is

using Amazon SageMaker pre-built Docker images to train the model. Which approach must the engineer

use to ingest the data to train the model?

Use AWS Database Migration Service (AWS DMS) to export the data to Amazon S3. Provide the S3 location within the SageMaker notebook.

New cards

An ML engineer wants to use Amazon SageMaker to create a model that predicts whether a student will

pass an exam. The ML engineer is developing a logistic regression model and needs to find an optimal model

with the most accurate classification threshold. The ML engineer must select a model evaluation technique to analyze the performance of the model

based on the defined threshold. The dataset contains an equal amount of observations for passed and failed

exam attempts. Which model evaluation technique meets the requirements?

Receiver operating characteristic (ROC) curve

New cards

A company urgently wants to deploy a newly trained ML model to improve customer experience for an existing application that has a custom traffic pattern. An MLOps engineer must build a deployment pipeline

to host the model on a persistent, scalable endpoint that provides consistently low latency. The MLOps

engineer must identify the instance type to use to host the model. Which solution will meet these

requirements with the LEAST operational overhead?

Use Amazon SageMaker Inference Recommender to run an inference recommendation job.

New cards

A financial company has a compliance policy that states that direct internet access from an Amazon

SageMaker notebook instance is not allowed. An ML engineer disabled direct internet access on the SageMaker notebook instance and hosted the instance in a private subnet in a VPC. However, internet access is required to update the SageMaker

instance. Which solution will meet these requirements?

Set up a NAT gateway within the VPC. Configure security groups and network access control lists (network ACLs) to allow outbound connections.

New cards

A global automotive company is managing a fleet of hundreds of thousands of vehicles. For each new

vehicle, the company receives a scan of the vehicle registration information card. The company uses Amazon Textract to extract the text from the scans. An ML engineer must redact the vehicle identification number (VIN) from the extracted text before being

used for modeling. Which solution will meet these requirements with the LEAST operational overhead?

Use Amazon Comprehend to run an asynchronous job to redact personally identifiable information (PII) entities. Set the redaction configuration to redact VINs from the text.

New cards

A data scientist created an Amazon SageMaker Processing job that processes CSV files in an Amazon

S3 bucket. The SageMaker Processing job can access the S3 bucket. However, when the job tries to access

the CSV files, the job receives a 403 error. What is the cause of the error?

The SageMaker Processing job execution role does not have the necessary permissions.

New cards

An ML engineer is experimenting with a large language model (LLM) for text generation on the Amazon Bedrock text playground. The ML engineer tests

inference on different prompts and discovers high randomness and variation of responses to the same

repeated questions. The ML engineer must change the inference parameters to standardize answers and generate more consistent responses. Which change to the inference parameters will meet these requirements?

Reduce the temperature parameter of the model.

New cards

A data science team has built over 50 models on Amazon SageMaker during the last several years. The models support different business groups within a company. The data science team wants to document the critical details of the models. The critical details include the background and purpose of the models. The critical details will give model consumers the

ability to browse and understand available models within the company. Which solution will meet these

requirements with the LEAST operational overhead?

Configure SageMaker Model Cards.

New cards

A financial services company is developing a new ML model to automatically assign credit limits to

customers when they apply for a credit card. To train the model, the company has gathered a large dataset

of customers and their history. The data includes credit transactions, credit scores, and other relevant financial and demographic information from the last year. The first results of the new model in training show that the model is returning inaccurate predictions for specific types of customer...

Configure SageMaker Clarify processing job to identify bias in the training data

New cards

An ML engineer must reuse features across different ML applications for training and low-latency inference.

The ML engineer must ensure that the features can be shared with multiple team members in different accounts. Which solution will meet these

requirements with the LEAST operational effort?

Use Amazon SageMaker Feature Store to store features for reuse and to provide access for team members across different accounts.

New cards

A company hosts many ML models that support unique use cases that have dynamic workloads. All the

models were trained by using the same ML framework. The models are hosted on Amazon SageMaker

dedicated endpoints that are underutilized. The company has a goal to optimize its environment for

cost. Which solution will meet these requirements MOST cost-effectively?

Configure a SageMaker multi-model endpoint.

New cards

A company is training a model on 50,000 images. A first evaluation of the model shows that the training

error rapidly decreases as the number of epochs increases, which causes the model to generalize poorly on the evaluation data. An ML engineer must

improve the generalization of the model. Which method will meet the requirements?

Increase the number for the regularization hyperparameter.