1/127
Vocabulary flashcards covering core machine-learning concepts, AWS AI/ML services, lifecycle phases, evaluation metrics, and key terminology needed for the AWS AIF Domain 1 exam.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Machine Learning (ML)
The science of creating algorithms and statistical models that enable computers to learn patterns from data and perform tasks without explicit programming.
Training Process
The iterative procedure in which an algorithm adjusts its internal parameters using labeled data until it can accurately map inputs to outputs.
Inference
The act of using a trained model to generate predictions on new, unseen data.
Algorithm
A mathematical formula or procedure that defines the relationship between inputs (features) and outputs (labels).
Model Parameters
Internal numeric values (e.g., weights) that a model adjusts during training to minimize prediction error.
Model Artifacts
The outputs of training—including learned parameters, model definition, and metadata—typically stored in Amazon S3.
Inference Code
Software that loads model artifacts and executes the model’s logic to produce predictions.
Amazon S3
Low-cost, virtually unlimited object storage service; primary data source for ML model training in AWS.
Structured Data
Tabular data organized in rows and columns (e.g., CSV files, relational databases).
Semi-Structured Data
Data that doesn’t fit fixed tables but has identifiable fields, often key–value pairs (e.g., JSON).
Unstructured Data
Data that lacks a predefined schema (e.g., images, video, audio, free-form text).
Time-Series Data
Sequential data points each tagged with a timestamp, used to model temporal trends.
Feature
An individual measurable property or characteristic of the data used as model input.
Tokenization
Technique that splits text into smaller units (words or phrases) so they can serve as features.
Real-Time Inference
Hosting a model on a persistent endpoint for low-latency, on-demand predictions.
Batch Inference (Batch Transform)
Processing large datasets in a single, non-persistent job for cost-effective offline predictions.
Supervised Learning
ML style that trains on data with known inputs and labeled outputs.
Unsupervised Learning
ML style that discovers hidden patterns in unlabeled data (e.g., clustering, anomaly detection).
Reinforcement Learning (RL)
An agent learns by taking actions in an environment to maximize cumulative rewards toward a goal.
Overfitting
A model that memorizes training data and performs poorly on unseen data because it fails to generalize.
Underfitting
A model that is too simple to capture underlying patterns, leading to poor performance on both training and new data.
Generalization
The ability of a model to perform well on new, unseen data.
Bias (Model)
Systematic disparity in model performance across demographic groups, often reflecting biased training data.
Neural Network
A set of interconnected nodes (neurons) organized in layers that learn complex patterns in data.
Deep Learning
A subset of ML using neural networks with multiple hidden layers to learn intricate representations, especially for unstructured data.
Transformer
A neural-network architecture that processes sequence elements in parallel, enabling large-scale generative AI models.
Large Language Model (LLM)
A transformer-based model with billions of parameters capable of advanced natural-language tasks.
Generative AI
Deep-learning approach focused on creating new content such as text, images, or code.
Amazon Bedrock
Fully managed service providing API access to multiple foundation models and tools for building generative-AI applications.
Interpretability
The extent to which a human can understand the reasoning behind a model’s prediction.
Deterministic System
Produces the exact same output for identical inputs every time (rule-based logic).
Probabilistic System
Produces outputs based on learned probabilities, so identical inputs may yield slightly different predictions.
Classification
Supervised-learning task that assigns inputs to discrete categories (binary or multiclass).
Regression
Supervised-learning task that predicts continuous numerical values.
Clustering
Unsupervised technique that groups similar data points without pre-existing labels.
Anomaly Detection
Unsupervised technique that identifies data points significantly different from the norm.
Logistic Regression
Despite its name, a classification algorithm that outputs probabilities between 0 and 1 for binary classes.
Linear Regression
Statistical method modeling a linear relationship between one or more inputs and a continuous output.
Amazon Rekognition
Pre-trained AWS service for image and video analysis—faces, objects, scenes, text, moderation, and custom labels.
Amazon Textract
Service that extracts text, forms, and tables from document images, going beyond basic OCR.
Amazon Comprehend
NLP service for sentiment analysis, entity recognition, language detection, and PII detection.
Amazon Lex
Service for building conversational chatbots and IVR systems using natural language understanding.
Amazon Transcribe
Automatic speech-recognition (ASR) service that converts audio to text in real time or batch.
Amazon Polly
Text-to-speech (TTS) service that generates natural-sounding speech from text.
Amazon Kendra
ML-powered enterprise search service that returns precise answers to natural-language queries.
Amazon Personalize
Service that delivers real-time personalized recommendations for products or content.
Amazon Translate
Neural machine-translation service supporting multilingual text translation.
Amazon Fraud Detector
Managed service that identifies potentially fraudulent online activities using ML.
Amazon SageMaker
Comprehensive ML platform for building, training, tuning, and deploying custom models.
SageMaker Ground Truth
Service for scalable data labeling with active learning and multiple workforce options.
SageMaker Data Wrangler
Low-code tool for connecting, exploring, and transforming data within SageMaker.
SageMaker Feature Store
Central repository for storing and sharing curated ML features across teams and models.
AWS Glue
Managed ETL service that discovers data schema, catalogs metadata, and transforms data at scale.
AWS Glue DataBrew
No-code visual data-preparation tool offering 250+ built-in transformations.
ETL (Extract, Transform, Load)
Process of pulling data from sources, converting it into usable format, and loading it to storage (e.g., S3).
Exploratory Data Analysis (EDA)
Visual and statistical examination of datasets to understand structure, spot anomalies, and guide feature engineering.
Feature Engineering
Selecting, creating, and transforming variables to improve model performance while reducing complexity.
Training-Validation-Test Split
Standard practice of dividing data (e.g., 80/10/10) for training, tuning, and unbiased final evaluation.
Hyperparameters
Model settings chosen before training (e.g., learning rate, number of layers) that influence learning but aren’t learned themselves.
SageMaker Training Job
Managed execution of user-supplied training code on a fleet of ML instances, outputting model artifacts to S3.
SageMaker Experiments
Capability for tracking, comparing, and visualizing multiple training runs and their metrics.
SageMaker Automatic Model Tuning (AMT)
Service that automates hyperparameter optimization to find the best model configuration.
ML Lifecycle
Iterative sequence: define goal, collect & prepare data, train, deploy, monitor, and repeat.
ML Pipeline
Interconnected, often automated steps that implement the ML lifecycle from data ingestion to model monitoring.
Batch Transform (SageMaker)
SageMaker option to run offline predictions on large datasets without persistent endpoints.
Asynchronous Inference
Endpoint type that queues large or long-running requests, scaling down when idle for cost savings.
Serverless Inference
Real-time prediction option using AWS Lambda—no server management, pay per invocation.
Data Drift
Change in statistical properties of input data over time compared to training data.
Concept Drift
Shift in the underlying relationship between input features and target variable over time.
SageMaker Model Monitor
Feature that tracks deployed models for drift, bias, and quality issues, integrating with CloudWatch alerts.
MLOps
Application of DevOps best practices to ML—automation, versioning, and continuous monitoring for reliable pipelines.
Infrastructure as Code (IaC)
Practice of defining cloud resources (e.g., ML infrastructure) via code for repeatability and version control.
SageMaker Pipelines
Service that orchestrates repeatable, versioned ML workflows—including data prep, training, evaluation, and deployment.
SageMaker Model Registry
Central catalog for versioning, approving, and deploying trained models.
AWS Step Functions
Serverless workflow orchestrator for coordinating AWS services into applications, including ML pipelines.
Amazon MWAA (Managed Workflows for Apache Airflow)
Managed service running Apache Airflow for authoring and scheduling complex data/ML workflows.
Confusion Matrix
Table summarizing classification results into true/false positives and negatives.
True Positive (TP)
A correct positive prediction by the model.
True Negative (TN)
A correct negative prediction by the model.
False Positive (FP)
Model incorrectly predicts positive (Type I error).
False Negative (FN)
Model incorrectly predicts negative (Type II error).
Accuracy
(TP+TN)/(TP+TN+FP+FN); overall proportion of correct predictions.
Precision
TP/(TP+FP); reliability of positive predictions—important when false positives are costly.
Recall (Sensitivity)
TP/(TP+FN); ability to find all actual positives—important when missing positives is costly.
Precision-Recall Trade-off
Improving precision typically lowers recall and vice versa; both cannot be maximized simultaneously.
F1 Score
Harmonic mean of precision and recall; balanced metric when both error types matter.
False Positive Rate (FPR)
FP/(FP+TN); proportion of negatives incorrectly labeled positive.
Specificity (True Negative Rate)
TN/(TN+FP); proportion of negatives correctly identified.
ROC Curve
Graph of TPR versus FPR across thresholds, depicting classifier performance.
AUC (Area Under ROC Curve)
Scalar value (0.5–1) summarizing ROC performance; higher indicates better class separation.
Mean Squared Error (MSE)
Average of squared prediction errors; penalizes large errors in regression.
Root Mean Squared Error (RMSE)
Square root of MSE; error measure in same units as target variable.
Mean Absolute Error (MAE)
Average of absolute prediction errors; less sensitive to outliers than MSE/RMSE.
Return on Investment (ROI)
Comparison of business value generated by an ML model to the total cost of developing and operating it.
Cost Allocation Tag
Key-value pair on AWS resources enabling detailed cost tracking for ML projects.
Transfer Learning
Technique of fine-tuning an existing pre-trained model on new, task-specific data to save time and resources.
Active Learning (Ground Truth)
Process where the system auto-labels confident data and sends uncertain examples to humans, reducing labeling cost.
Retrieval Augmented Generation (RAG)
Approach where a generative model retrieves external knowledge to enhance response accuracy and freshness.
SageMaker Inference Recommender
Tool that tests instance types and configurations to suggest optimal deployment settings for a model.
AWS AI Service Hierarchy
Guideline: use pre-trained AI services first, fine-tune models second, build custom models last.