AWS Certified AI Practitioner (AIF-C01) – Key Vocabulary
Accuracy
- Definition: Measures the proportion of correct predictions made by a model out of all predictions.
\text{Accuracy}=\frac{\text{True Positives}+\text{True Negatives}}{\text{Total Predictions}} - Example: Model correctly classifies 95 of 100 cat-and-dog images → 95\% accuracy.
- Need to Know: Can be misleading on imbalanced datasets—supplement with precision & recall.
Activation Function
- Definition: Mathematical function inside each neural-network node that determines that node’s output, injecting non-linearity so the network can learn complex patterns.
- Common Types / Behaviors: ReLU (fast, sparsity), Sigmoid (probability-like output), Tanh (zero-centered).
- Need to Know: Choice of activation affects convergence, vanishing/exploding gradients, and model expressiveness.
Agents for Amazon Bedrock
- Definition: Bedrock feature that lets foundation models execute multi-step workflows by calling APIs, reasoning over plans, and interacting with corporate systems.
- Example: User asks “What’s the status of my order?” → agent calls order-lookup API then shipping-status API, synthesizes answer.
- Need to Know: Used to orchestrate & automate business processes with natural-language interfaces.
Amazon Bedrock
- Definition: Fully managed service offering multiple third-party and first-party foundation models behind a single API.
- Example Uses: Build chatbot with Anthropic Claude; generate marketing copy with Amazon Titan Text.
- Need to Know: Primary AWS entry point for Generative AI; integrates features like Agents, Knowledge Bases, Provisioned Throughput.
Amazon Comprehend
- Definition: Managed NLP service for extracting entities, key phrases, sentiment, language, and more from unstructured text.
- Example: Classify sentiment of customer reviews.
- Need to Know: Use when you need insights rather than search (Kendra) or generation (Bedrock).
Amazon Kendra
- Definition: ML-powered enterprise search that answers natural-language queries across many data sources.
- Example: Employee asks “What is the company’s remote-work policy?” → Kendra surfaces paragraph from HR docs in S3 & SharePoint.
- Need to Know: Optimized for retrieval/QA, not generic text analysis; excels at pinpointing answers in document corpora.
Amazon Lex
- Definition: Service for building chatbots & voice bots with ASR + NLU.
- Example: Voice bot in banking app to check balances.
- Need to Know: Central concept of intents & slots; integrates with Lambda for fulfillment.
Amazon Polly
- Definition: Text-to-Speech (TTS) service with lifelike neural voices in many languages.
- Example: E-learning platform reads articles aloud for visually impaired users.
- Need to Know: “Text in, Voice out.” Complements Amazon Transcribe (opposite direction).
Amazon Rekognition
- Definition: Computer-vision service for image & video analysis—object detection, moderation, facial analysis, text-in-image, activity recognition.
- Example: Social-media site flags inappropriate video content.
- Need to Know: First stop for AWS CV tasks; can analyze stored or streaming video.
Amazon SageMaker
- Definition: End-to-end ML platform (data prep, notebooks, training, hyper-parameter tuning, hosting, MLOps).
- Example: Fraud-detection model—experiment in notebooks, train with managed clusters, deploy to real-time endpoints.
- Need to Know: Provides deeper control than AI services; supports built-in, BYO, or Bedrock-integrated models.
Amazon Titan
- Definition: AWS-built family of foundation models for text, images, embeddings; available only through Bedrock.
- Example: Titan Text Express for chat; Titan Image Generator for product mock-ups.
- Need to Know: Developed with responsible-AI safeguards; first-party alternative to third-party FMs.
Amazon Transcribe
- Definition: Automatic Speech Recognition (ASR) turning audio → text.
- Example: Call-center transcribes every support call for analytics.
- Need to Know: “Voice in, Text out.” Complements Polly.
Amazon Translate
- Definition: Neural machine-translation service for real-time or batch language conversion.
- Example: News site offers articles in multiple languages instantly.
- Need to Know: Scalable & pay-as-you-go for multilingual apps.
Artificial Intelligence (AI)
- Definition: Broad field aiming to build systems that perform tasks requiring human intelligence (vision, language, reasoning).
- Hierarchy: AI ⊃ Machine Learning ⊃ Deep Learning.
Bias (AI)
- Definition: Systematic, unfair skew in model outputs due to non-representative data or flawed assumptions.
- Example: Hiring model trained solely on male résumés discriminates against female applicants.
- Need to Know: Detection & mitigation are essential elements of Responsible AI.
Chatbot
- Definition: Software that simulates human conversation via text or voice.
- Example: Retail-site bot answers order-status questions.
Classification
- Definition: Supervised learning task predicting a categorical label.
- Examples: Email → “spam / not-spam”; tumor → “benign / malignant”.
- Need to Know: Output is discrete class; evaluate with accuracy, precision, recall.
Clustering
- Definition: Unsupervised learning that groups similar items without pre-existing labels.
- Example: Market-segment customers by behavior.
- Need to Know: Reveals natural structure; distance metrics & cluster validation matter.
Computer Vision
- Definition: AI field enabling machines to understand images/video.
- Example Service: Amazon Rekognition.
Context Window
- Definition: Maximum number of tokens (input + output) an LLM can consider in one interaction.
- Example: 2 000-token window can’t process a 3 000-token doc at once.
- Need to Know: Larger windows aid long-form summarisation & multi-turn dialogue.
Deep Learning
- Definition: Sub-field of ML using multi-layer neural networks.
- Relationship: All deep learning is ML; not all ML is deep learning.
Domain-Adaptation Fine-Tuning
- Definition: Unsupervised adaptation of an FM to a domain’s language/style by training on a large unlabeled corpus from that domain.
- Example: Pre-train on internal legal docs to make model “legal-fluent.”
Embeddings
- Definition: Vector representations capturing semantic meaning of data (text, images, etc.).
- Example: Titan Multimodal Embeddings place picture of red shoe and phrase “red shoe” near each other in vector space.
- Need to Know: Power semantic search, clustering, recommendations.
Epoch
- Definition: One full pass of the learning algorithm over the entire training dataset.
- Example: 10 epochs → dataset seen 10 times.
Explainability
- Definition: Ability to understand & articulate how/why a model made a prediction.
- Example: Loan model shows income & credit-score weights.
- Need to Know: Important for trust, debugging, compliance.
Fairness
- Definition: Responsible-AI pillar ensuring predictions are not biased against sub-groups (race, gender, age, etc.).
Fine-Tuning
- Definition: Train a pre-trained FM on a smaller labeled set to specialize it; updates the model’s weights.
- Example: Company chat logs → fine-tune base LLM to brand voice.
Foundation Model (FM)
- Definition: Large, general-purpose, pre-trained deep-learning model adaptable to many downstream tasks.
- Examples: GPT-4, Llama 3, Amazon Titan.
Generative AI
- Definition: AI that produces new content (text, images, code) similar to its training data.
- Need to Know: Creates rather than classifies; powered by FMs.
Hallucination
- Definition: LLM outputs false or nonsensical content presented as fact.
- Example: Chatbot invents historical figures.
- Mitigation: Retrieval-Augmented Generation (RAG) grounds responses in factual data.
Hyperparameter
- Definition: Training-process variable set by practitioner, not learned (e.g., learning rate, batch size, epochs).
- Need to Know: “Knobs” for controlling training; tuned via validation data.
Inference
- Definition: Using a trained model to make predictions on new data.
- Example: Send dog image → model returns “dog.”
- Need to Know: Training is one-off; inference occurs per request.
Instruction-Based Fine-Tuning
- Definition: Supervised fine-tuning where each training sample is an instruction/command (prompt) with an expected completion.
- Example JSON:
{"prompt":"Translate to French: 'Hello, how are you?'","completion":"Bonjour, comment ça va?"}
- Need to Know: Highly effective for teaching models to follow explicit directions.
Knowledge Bases for Amazon Bedrock
- Definition: Managed RAG implementation connecting FMs to your S3 data for grounded, context-specific answers.
- Need to Know: Reduces hallucination; simplifies building RAG pipelines.
Labeled Data
- Definition: Dataset annotated with correct answers, required for supervised learning.
- Examples: Images tagged “cat/dog”; emails tagged “spam / not-spam.”
Large Language Model (LLM)
- Definition: Foundation model trained on vast text to understand & generate human-like language.
- Examples: Titan Text, Claude, Llama.
Latency
- Definition: Time between prompt arrival and model response.
- Need to Know: Low latency critical for real-time apps (chat, voice assistants).
Machine Learning (ML)
- Definition: Sub-field of AI where algorithms learn patterns from data to make predictions without hard-coded rules.
Modality
- Definition: Data type a model processes (text, image, audio, video).
- Example: Multimodal model handles text + images.
Multi-Turn Messaging
- Definition: Stateful dialogue with multiple back-and-forth exchanges; model must track context history.
- Limit: Bound by context window.
Natural Language Processing (NLP)
- Definition: AI field enabling computers to understand and generate human language.
- AWS Example: Amazon Comprehend.
Neural Network
- Definition: Computing system inspired by biological neurons; layers of interconnected nodes learn hierarchical representations.
Overfitting
- Definition: Model learns noise/details of training data, harming generalization to new data.
- Need to Know: Mitigate via regularization, early stopping, validation checks.
Parameter
- Definition: Model’s internal variable (weights/biases) learned during training.
- Contrast: Parameters are learned; hyperparameters are set.
Precision
- Definition: Fraction of predicted positives that are correct.
\text{Precision}=\frac{\text{True Positives}}{\text{True Positives}+\text{False Positives}} - Example: % of emails flagged as spam that truly are spam.
- Need to Know: High precision ⇒ low false-positive rate.
Prompt
- Definition: Input text/instruction fed to an FM. Drives behavior.
Prompt Engineering
- Definition: Crafting/refining prompts to steer FM toward desired output; techniques include few-shot, role-prompting, chain-of-thought.
Provisioned Throughput (Bedrock)
- Definition: Purchase of dedicated inference capacity for a model, guaranteeing throughput & latency.
- Need to Know: Required for production use of custom fine-tuned models.
Recall
- Definition: Fraction of actual positives correctly identified.
\text{Recall}=\frac{\text{True Positives}}{\text{True Positives}+\text{False Negatives}} - Example: % of sick patients correctly diagnosed.
- Need to Know: High recall ⇒ low false-negative rate.
Regression
- Definition: Supervised learning predicting a continuous value.
- Examples: House-price estimation; quarterly sales forecast.
Reinforcement Learning
- Definition: Agent learns optimal actions in an environment by maximizing cumulative reward.
- Example: AWS DeepRacer, AlphaGo.
Responsible AI
- Definition: Designing & deploying AI that is safe, ethical, fair, transparent, and aligned with human values.
- Pillars: Fairness, Explainability, Governance, Transparency.
Retrieval-Augmented Generation (RAG)
- Definition: Framework that retrieves relevant knowledge and prepends it to the LLM prompt, grounding responses.
- Example: Chatbot pulls latest company news before answering.
- Need to Know: Reduces hallucinations without changing model weights; Bedrock Knowledge Bases provide managed RAG.
Single-Turn Messaging
- Definition: Stateless interaction: one user prompt → one model response, no memory.
Supervised Learning
- Definition: ML where model learns from labeled data; includes classification & regression.
Token
- Definition: Smallest text unit processed by an LLM (word, sub-word, punctuation). Context window measured in tokens.
Training
- Definition: Process of teaching a model via data so it learns parameter values.
Transparency
- Definition: Responsible-AI principle of disclosing system purpose, workings, limitations, and data.
- Example: Titan Image Generator adds invisible watermark indicating AI origin.
Unlabeled Data
- Definition: Data with no annotations; used in unsupervised learning or domain-adaptation.
- Example: Huge collection of untagged web text.
Unsupervised Learning
- Definition: ML where model finds patterns/structure in unlabeled data.
- Primary Example: Clustering.
Validation Data
- Definition: Held-out subset of the original data used during training to tune hyperparameters & monitor overfitting.