1/50
Looks like no tags are added yet.
Name | Mastery | Learn | Test | Matching | Spaced | Call with Kai |
|---|
No study sessions yet.
Pre-splitting Your Data
Used for when you need explicit control over the data in your training and evaluation datasources.
Sequentially Splitting Your Data
This approach is useful if you want to evaluate your ML models on data for a certain date or within a certain time range
Randomly Splitting Your Data
This approach is useful to ensure that the distribution of the data is similar in the training and evaluation datasources.
True
Is important to use the same seed string value for both datasources and the complement flag for one datasource
True
A common pitfall in developing a high-quality ML model is evaluating the ML model on data that is not similar to the data used for training.
True
The model and evaluation are too dissimilar (have extremely different descriptive statistics) to be useful.
This can happen when input data is sorted by one of the columns in the dataset
and then split sequentially.
False
You need to use random splitting in Amazon ML if you have already randomized your input data
groupFiles
Set _ to inPartition to enable the grouping of files within an Amazon S3 data partition.
groupSize
Set _ to the target size of groups in bytes.
False
The groupSize property is required
recurse
Set _ to True to recursively read files in all subdirectories when specifying paths as an array of paths.
False
You need to set recurse if paths is an array of object keys in Amazon S3
Sequence-to-Sequence Algorithm
supervised learning algorithm where the input is a sequence of tokens (for example
Sequence-to-Sequence Algorithm
Algorithm to use for a:
machine translation
Sequence-to-Sequence Algorithm
Algorithm to use for a:
text summarization
True
Amazon SageMaker AI seq2seq uses Recurrent Neural Networks (RNNs) models
False
Amazon SageMaker AI seq2seq does not use Convolutional Neural Network (CNN) models
Data Flow
Create a _ to define a series of ML data prep steps.
Used to combine datasets from different data sources
identify the number and types of transformations you want to apply to datasets
Transform
Clean and your dataset using standard s like string
Examples in usage:
Generate Data Insights
Automatically verify data quality and detect abnormalities in your data with Data Wrangler Data Quality and Insights Report.
True
Amazon SageMaker Canvas supports training a range of model types
Amazon SageMaker Canvas
Canvas custom model on the following types of datasets:
categorical
Numeric prediction
Predicting house prices based on features like square footage
Numeric
Local upload
Amazon S3
2 category prediction
Predicting whether or not a customer is likely to churn
Binary or categorical
Local upload
Amazon S3
3+ category prediction
Predicting patient outcomes after being discharged from the hospital
Categorical
Local upload
Amazon S3
Time series forecasting
Predicting your inventory for the next quarter
Timeseries
Local upload
Amazon S3
Single-label image prediction
Predicting types of manufacturing defects in images
Image (JPG
PNG)
Local upload
Amazon S3
Multi-category text prediction
Predicting categories of products
Target column: binary or categorical
Local upload
Amazon S3
True
In Amazon ML
Area Under the (Receiver Operating Characteristic) Curve (AUC)
Amazon ML provides an industry-standard accuracy metric for binary classification models called _
True
AUC values near 1 indicate an ML model that is highly accurate.
True
Values near 0.5 indicate an ML model that is no better than guessing at random.
True
Values near 0 are unusual to see