AWS AIF – DOMAIN 1: Fundamentals of AI and ML

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/127

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering core machine-learning concepts, AWS AI/ML services, lifecycle phases, evaluation metrics, and key terminology needed for the AWS AIF Domain 1 exam.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

128 Terms

1
New cards

Machine Learning (ML)

The science of creating algorithms and statistical models that enable computers to learn patterns from data and perform tasks without explicit programming.

2
New cards

Training Process

The iterative procedure in which an algorithm adjusts its internal parameters using labeled data until it can accurately map inputs to outputs.

3
New cards

Inference

The act of using a trained model to generate predictions on new, unseen data.

4
New cards

Algorithm

A mathematical formula or procedure that defines the relationship between inputs (features) and outputs (labels).

5
New cards

Model Parameters

Internal numeric values (e.g., weights) that a model adjusts during training to minimize prediction error.

6
New cards

Model Artifacts

The outputs of training—including learned parameters, model definition, and metadata—typically stored in Amazon S3.

7
New cards

Inference Code

Software that loads model artifacts and executes the model’s logic to produce predictions.

8
New cards

Amazon S3

Low-cost, virtually unlimited object storage service; primary data source for ML model training in AWS.

9
New cards

Structured Data

Tabular data organized in rows and columns (e.g., CSV files, relational databases).

10
New cards

Semi-Structured Data

Data that doesn’t fit fixed tables but has identifiable fields, often key–value pairs (e.g., JSON).

11
New cards

Unstructured Data

Data that lacks a predefined schema (e.g., images, video, audio, free-form text).

12
New cards

Time-Series Data

Sequential data points each tagged with a timestamp, used to model temporal trends.

13
New cards

Feature

An individual measurable property or characteristic of the data used as model input.

14
New cards

Tokenization

Technique that splits text into smaller units (words or phrases) so they can serve as features.

15
New cards

Real-Time Inference

Hosting a model on a persistent endpoint for low-latency, on-demand predictions.

16
New cards

Batch Inference (Batch Transform)

Processing large datasets in a single, non-persistent job for cost-effective offline predictions.

17
New cards

Supervised Learning

ML style that trains on data with known inputs and labeled outputs.

18
New cards

Unsupervised Learning

ML style that discovers hidden patterns in unlabeled data (e.g., clustering, anomaly detection).

19
New cards

Reinforcement Learning (RL)

An agent learns by taking actions in an environment to maximize cumulative rewards toward a goal.

20
New cards

Overfitting

A model that memorizes training data and performs poorly on unseen data because it fails to generalize.

21
New cards

Underfitting

A model that is too simple to capture underlying patterns, leading to poor performance on both training and new data.

22
New cards

Generalization

The ability of a model to perform well on new, unseen data.

23
New cards

Bias (Model)

Systematic disparity in model performance across demographic groups, often reflecting biased training data.

24
New cards

Neural Network

A set of interconnected nodes (neurons) organized in layers that learn complex patterns in data.

25
New cards

Deep Learning

A subset of ML using neural networks with multiple hidden layers to learn intricate representations, especially for unstructured data.

26
New cards

Transformer

A neural-network architecture that processes sequence elements in parallel, enabling large-scale generative AI models.

27
New cards

Large Language Model (LLM)

A transformer-based model with billions of parameters capable of advanced natural-language tasks.

28
New cards

Generative AI

Deep-learning approach focused on creating new content such as text, images, or code.

29
New cards

Amazon Bedrock

Fully managed service providing API access to multiple foundation models and tools for building generative-AI applications.

30
New cards

Interpretability

The extent to which a human can understand the reasoning behind a model’s prediction.

31
New cards

Deterministic System

Produces the exact same output for identical inputs every time (rule-based logic).

32
New cards

Probabilistic System

Produces outputs based on learned probabilities, so identical inputs may yield slightly different predictions.

33
New cards

Classification

Supervised-learning task that assigns inputs to discrete categories (binary or multiclass).

34
New cards

Regression

Supervised-learning task that predicts continuous numerical values.

35
New cards

Clustering

Unsupervised technique that groups similar data points without pre-existing labels.

36
New cards

Anomaly Detection

Unsupervised technique that identifies data points significantly different from the norm.

37
New cards

Logistic Regression

Despite its name, a classification algorithm that outputs probabilities between 0 and 1 for binary classes.

38
New cards

Linear Regression

Statistical method modeling a linear relationship between one or more inputs and a continuous output.

39
New cards

Amazon Rekognition

Pre-trained AWS service for image and video analysis—faces, objects, scenes, text, moderation, and custom labels.

40
New cards

Amazon Textract

Service that extracts text, forms, and tables from document images, going beyond basic OCR.

41
New cards

Amazon Comprehend

NLP service for sentiment analysis, entity recognition, language detection, and PII detection.

42
New cards

Amazon Lex

Service for building conversational chatbots and IVR systems using natural language understanding.

43
New cards

Amazon Transcribe

Automatic speech-recognition (ASR) service that converts audio to text in real time or batch.

44
New cards

Amazon Polly

Text-to-speech (TTS) service that generates natural-sounding speech from text.

45
New cards

Amazon Kendra

ML-powered enterprise search service that returns precise answers to natural-language queries.

46
New cards

Amazon Personalize

Service that delivers real-time personalized recommendations for products or content.

47
New cards

Amazon Translate

Neural machine-translation service supporting multilingual text translation.

48
New cards

Amazon Fraud Detector

Managed service that identifies potentially fraudulent online activities using ML.

49
New cards

Amazon SageMaker

Comprehensive ML platform for building, training, tuning, and deploying custom models.

50
New cards

SageMaker Ground Truth

Service for scalable data labeling with active learning and multiple workforce options.

51
New cards

SageMaker Data Wrangler

Low-code tool for connecting, exploring, and transforming data within SageMaker.

52
New cards

SageMaker Feature Store

Central repository for storing and sharing curated ML features across teams and models.

53
New cards

AWS Glue

Managed ETL service that discovers data schema, catalogs metadata, and transforms data at scale.

54
New cards

AWS Glue DataBrew

No-code visual data-preparation tool offering 250+ built-in transformations.

55
New cards

ETL (Extract, Transform, Load)

Process of pulling data from sources, converting it into usable format, and loading it to storage (e.g., S3).

56
New cards

Exploratory Data Analysis (EDA)

Visual and statistical examination of datasets to understand structure, spot anomalies, and guide feature engineering.

57
New cards

Feature Engineering

Selecting, creating, and transforming variables to improve model performance while reducing complexity.

58
New cards

Training-Validation-Test Split

Standard practice of dividing data (e.g., 80/10/10) for training, tuning, and unbiased final evaluation.

59
New cards

Hyperparameters

Model settings chosen before training (e.g., learning rate, number of layers) that influence learning but aren’t learned themselves.

60
New cards

SageMaker Training Job

Managed execution of user-supplied training code on a fleet of ML instances, outputting model artifacts to S3.

61
New cards

SageMaker Experiments

Capability for tracking, comparing, and visualizing multiple training runs and their metrics.

62
New cards

SageMaker Automatic Model Tuning (AMT)

Service that automates hyperparameter optimization to find the best model configuration.

63
New cards

ML Lifecycle

Iterative sequence: define goal, collect & prepare data, train, deploy, monitor, and repeat.

64
New cards

ML Pipeline

Interconnected, often automated steps that implement the ML lifecycle from data ingestion to model monitoring.

65
New cards

Batch Transform (SageMaker)

SageMaker option to run offline predictions on large datasets without persistent endpoints.

66
New cards

Asynchronous Inference

Endpoint type that queues large or long-running requests, scaling down when idle for cost savings.

67
New cards

Serverless Inference

Real-time prediction option using AWS Lambda—no server management, pay per invocation.

68
New cards

Data Drift

Change in statistical properties of input data over time compared to training data.

69
New cards

Concept Drift

Shift in the underlying relationship between input features and target variable over time.

70
New cards

SageMaker Model Monitor

Feature that tracks deployed models for drift, bias, and quality issues, integrating with CloudWatch alerts.

71
New cards

MLOps

Application of DevOps best practices to ML—automation, versioning, and continuous monitoring for reliable pipelines.

72
New cards

Infrastructure as Code (IaC)

Practice of defining cloud resources (e.g., ML infrastructure) via code for repeatability and version control.

73
New cards

SageMaker Pipelines

Service that orchestrates repeatable, versioned ML workflows—including data prep, training, evaluation, and deployment.

74
New cards

SageMaker Model Registry

Central catalog for versioning, approving, and deploying trained models.

75
New cards

AWS Step Functions

Serverless workflow orchestrator for coordinating AWS services into applications, including ML pipelines.

76
New cards

Amazon MWAA (Managed Workflows for Apache Airflow)

Managed service running Apache Airflow for authoring and scheduling complex data/ML workflows.

77
New cards

Confusion Matrix

Table summarizing classification results into true/false positives and negatives.

78
New cards

True Positive (TP)

A correct positive prediction by the model.

79
New cards

True Negative (TN)

A correct negative prediction by the model.

80
New cards

False Positive (FP)

Model incorrectly predicts positive (Type I error).

81
New cards

False Negative (FN)

Model incorrectly predicts negative (Type II error).

82
New cards

Accuracy

(TP+TN)/(TP+TN+FP+FN); overall proportion of correct predictions.

83
New cards

Precision

TP/(TP+FP); reliability of positive predictions—important when false positives are costly.

84
New cards

Recall (Sensitivity)

TP/(TP+FN); ability to find all actual positives—important when missing positives is costly.

85
New cards

Precision-Recall Trade-off

Improving precision typically lowers recall and vice versa; both cannot be maximized simultaneously.

86
New cards

F1 Score

Harmonic mean of precision and recall; balanced metric when both error types matter.

87
New cards

False Positive Rate (FPR)

FP/(FP+TN); proportion of negatives incorrectly labeled positive.

88
New cards

Specificity (True Negative Rate)

TN/(TN+FP); proportion of negatives correctly identified.

89
New cards

ROC Curve

Graph of TPR versus FPR across thresholds, depicting classifier performance.

90
New cards

AUC (Area Under ROC Curve)

Scalar value (0.5–1) summarizing ROC performance; higher indicates better class separation.

91
New cards

Mean Squared Error (MSE)

Average of squared prediction errors; penalizes large errors in regression.

92
New cards

Root Mean Squared Error (RMSE)

Square root of MSE; error measure in same units as target variable.

93
New cards

Mean Absolute Error (MAE)

Average of absolute prediction errors; less sensitive to outliers than MSE/RMSE.

94
New cards

Return on Investment (ROI)

Comparison of business value generated by an ML model to the total cost of developing and operating it.

95
New cards

Cost Allocation Tag

Key-value pair on AWS resources enabling detailed cost tracking for ML projects.

96
New cards

Transfer Learning

Technique of fine-tuning an existing pre-trained model on new, task-specific data to save time and resources.

97
New cards

Active Learning (Ground Truth)

Process where the system auto-labels confident data and sends uncertain examples to humans, reducing labeling cost.

98
New cards

Retrieval Augmented Generation (RAG)

Approach where a generative model retrieves external knowledge to enhance response accuracy and freshness.

99
New cards

SageMaker Inference Recommender

Tool that tests instance types and configurations to suggest optimal deployment settings for a model.

100
New cards

AWS AI Service Hierarchy

Guideline: use pre-trained AI services first, fine-tune models second, build custom models last.