SA

AWS Certified AI Practitioner (AIF-C01) – Key Vocabulary

Accuracy

  • Definition: Measures the proportion of correct predictions made by a model out of all predictions.
    \text{Accuracy}=\frac{\text{True Positives}+\text{True Negatives}}{\text{Total Predictions}}
  • Example: Model correctly classifies 95 of 100 cat-and-dog images → 95\% accuracy.
  • Need to Know: Can be misleading on imbalanced datasets—supplement with precision & recall.

Activation Function

  • Definition: Mathematical function inside each neural-network node that determines that node’s output, injecting non-linearity so the network can learn complex patterns.
  • Common Types / Behaviors: ReLU (fast, sparsity), Sigmoid (probability-like output), Tanh (zero-centered).
  • Need to Know: Choice of activation affects convergence, vanishing/exploding gradients, and model expressiveness.

Agents for Amazon Bedrock

  • Definition: Bedrock feature that lets foundation models execute multi-step workflows by calling APIs, reasoning over plans, and interacting with corporate systems.
  • Example: User asks “What’s the status of my order?” → agent calls order-lookup API then shipping-status API, synthesizes answer.
  • Need to Know: Used to orchestrate & automate business processes with natural-language interfaces.

Amazon Bedrock

  • Definition: Fully managed service offering multiple third-party and first-party foundation models behind a single API.
  • Example Uses: Build chatbot with Anthropic Claude; generate marketing copy with Amazon Titan Text.
  • Need to Know: Primary AWS entry point for Generative AI; integrates features like Agents, Knowledge Bases, Provisioned Throughput.

Amazon Comprehend

  • Definition: Managed NLP service for extracting entities, key phrases, sentiment, language, and more from unstructured text.
  • Example: Classify sentiment of customer reviews.
  • Need to Know: Use when you need insights rather than search (Kendra) or generation (Bedrock).

Amazon Kendra

  • Definition: ML-powered enterprise search that answers natural-language queries across many data sources.
  • Example: Employee asks “What is the company’s remote-work policy?” → Kendra surfaces paragraph from HR docs in S3 & SharePoint.
  • Need to Know: Optimized for retrieval/QA, not generic text analysis; excels at pinpointing answers in document corpora.

Amazon Lex

  • Definition: Service for building chatbots & voice bots with ASR + NLU.
  • Example: Voice bot in banking app to check balances.
  • Need to Know: Central concept of intents & slots; integrates with Lambda for fulfillment.

Amazon Polly

  • Definition: Text-to-Speech (TTS) service with lifelike neural voices in many languages.
  • Example: E-learning platform reads articles aloud for visually impaired users.
  • Need to Know: “Text in, Voice out.” Complements Amazon Transcribe (opposite direction).

Amazon Rekognition

  • Definition: Computer-vision service for image & video analysis—object detection, moderation, facial analysis, text-in-image, activity recognition.
  • Example: Social-media site flags inappropriate video content.
  • Need to Know: First stop for AWS CV tasks; can analyze stored or streaming video.

Amazon SageMaker

  • Definition: End-to-end ML platform (data prep, notebooks, training, hyper-parameter tuning, hosting, MLOps).
  • Example: Fraud-detection model—experiment in notebooks, train with managed clusters, deploy to real-time endpoints.
  • Need to Know: Provides deeper control than AI services; supports built-in, BYO, or Bedrock-integrated models.

Amazon Titan

  • Definition: AWS-built family of foundation models for text, images, embeddings; available only through Bedrock.
  • Example: Titan Text Express for chat; Titan Image Generator for product mock-ups.
  • Need to Know: Developed with responsible-AI safeguards; first-party alternative to third-party FMs.

Amazon Transcribe

  • Definition: Automatic Speech Recognition (ASR) turning audio → text.
  • Example: Call-center transcribes every support call for analytics.
  • Need to Know: “Voice in, Text out.” Complements Polly.

Amazon Translate

  • Definition: Neural machine-translation service for real-time or batch language conversion.
  • Example: News site offers articles in multiple languages instantly.
  • Need to Know: Scalable & pay-as-you-go for multilingual apps.

Artificial Intelligence (AI)

  • Definition: Broad field aiming to build systems that perform tasks requiring human intelligence (vision, language, reasoning).
  • Hierarchy: AI ⊃ Machine Learning ⊃ Deep Learning.

Bias (AI)

  • Definition: Systematic, unfair skew in model outputs due to non-representative data or flawed assumptions.
  • Example: Hiring model trained solely on male résumés discriminates against female applicants.
  • Need to Know: Detection & mitigation are essential elements of Responsible AI.

Chatbot

  • Definition: Software that simulates human conversation via text or voice.
  • Example: Retail-site bot answers order-status questions.

Classification

  • Definition: Supervised learning task predicting a categorical label.
  • Examples: Email → “spam / not-spam”; tumor → “benign / malignant”.
  • Need to Know: Output is discrete class; evaluate with accuracy, precision, recall.

Clustering

  • Definition: Unsupervised learning that groups similar items without pre-existing labels.
  • Example: Market-segment customers by behavior.
  • Need to Know: Reveals natural structure; distance metrics & cluster validation matter.

Computer Vision

  • Definition: AI field enabling machines to understand images/video.
  • Example Service: Amazon Rekognition.

Context Window

  • Definition: Maximum number of tokens (input + output) an LLM can consider in one interaction.
  • Example: 2 000-token window can’t process a 3 000-token doc at once.
  • Need to Know: Larger windows aid long-form summarisation & multi-turn dialogue.

Deep Learning

  • Definition: Sub-field of ML using multi-layer neural networks.
  • Relationship: All deep learning is ML; not all ML is deep learning.

Domain-Adaptation Fine-Tuning

  • Definition: Unsupervised adaptation of an FM to a domain’s language/style by training on a large unlabeled corpus from that domain.
  • Example: Pre-train on internal legal docs to make model “legal-fluent.”

Embeddings

  • Definition: Vector representations capturing semantic meaning of data (text, images, etc.).
  • Example: Titan Multimodal Embeddings place picture of red shoe and phrase “red shoe” near each other in vector space.
  • Need to Know: Power semantic search, clustering, recommendations.

Epoch

  • Definition: One full pass of the learning algorithm over the entire training dataset.
  • Example: 10 epochs → dataset seen 10 times.

Explainability

  • Definition: Ability to understand & articulate how/why a model made a prediction.
  • Example: Loan model shows income & credit-score weights.
  • Need to Know: Important for trust, debugging, compliance.

Fairness

  • Definition: Responsible-AI pillar ensuring predictions are not biased against sub-groups (race, gender, age, etc.).

Fine-Tuning

  • Definition: Train a pre-trained FM on a smaller labeled set to specialize it; updates the model’s weights.
  • Example: Company chat logs → fine-tune base LLM to brand voice.

Foundation Model (FM)

  • Definition: Large, general-purpose, pre-trained deep-learning model adaptable to many downstream tasks.
  • Examples: GPT-4, Llama 3, Amazon Titan.

Generative AI

  • Definition: AI that produces new content (text, images, code) similar to its training data.
  • Need to Know: Creates rather than classifies; powered by FMs.

Hallucination

  • Definition: LLM outputs false or nonsensical content presented as fact.
  • Example: Chatbot invents historical figures.
  • Mitigation: Retrieval-Augmented Generation (RAG) grounds responses in factual data.

Hyperparameter

  • Definition: Training-process variable set by practitioner, not learned (e.g., learning rate, batch size, epochs).
  • Need to Know: “Knobs” for controlling training; tuned via validation data.

Inference

  • Definition: Using a trained model to make predictions on new data.
  • Example: Send dog image → model returns “dog.”
  • Need to Know: Training is one-off; inference occurs per request.

Instruction-Based Fine-Tuning

  • Definition: Supervised fine-tuning where each training sample is an instruction/command (prompt) with an expected completion.
  • Example JSON: {"prompt":"Translate to French: 'Hello, how are you?'","completion":"Bonjour, comment ça va?"}
  • Need to Know: Highly effective for teaching models to follow explicit directions.

Knowledge Bases for Amazon Bedrock

  • Definition: Managed RAG implementation connecting FMs to your S3 data for grounded, context-specific answers.
  • Need to Know: Reduces hallucination; simplifies building RAG pipelines.

Labeled Data

  • Definition: Dataset annotated with correct answers, required for supervised learning.
  • Examples: Images tagged “cat/dog”; emails tagged “spam / not-spam.”

Large Language Model (LLM)

  • Definition: Foundation model trained on vast text to understand & generate human-like language.
  • Examples: Titan Text, Claude, Llama.

Latency

  • Definition: Time between prompt arrival and model response.
  • Need to Know: Low latency critical for real-time apps (chat, voice assistants).

Machine Learning (ML)

  • Definition: Sub-field of AI where algorithms learn patterns from data to make predictions without hard-coded rules.

Modality

  • Definition: Data type a model processes (text, image, audio, video).
  • Example: Multimodal model handles text + images.

Multi-Turn Messaging

  • Definition: Stateful dialogue with multiple back-and-forth exchanges; model must track context history.
  • Limit: Bound by context window.

Natural Language Processing (NLP)

  • Definition: AI field enabling computers to understand and generate human language.
  • AWS Example: Amazon Comprehend.

Neural Network

  • Definition: Computing system inspired by biological neurons; layers of interconnected nodes learn hierarchical representations.

Overfitting

  • Definition: Model learns noise/details of training data, harming generalization to new data.
  • Need to Know: Mitigate via regularization, early stopping, validation checks.

Parameter

  • Definition: Model’s internal variable (weights/biases) learned during training.
  • Contrast: Parameters are learned; hyperparameters are set.

Precision

  • Definition: Fraction of predicted positives that are correct.
    \text{Precision}=\frac{\text{True Positives}}{\text{True Positives}+\text{False Positives}}
  • Example: % of emails flagged as spam that truly are spam.
  • Need to Know: High precision ⇒ low false-positive rate.

Prompt

  • Definition: Input text/instruction fed to an FM. Drives behavior.

Prompt Engineering

  • Definition: Crafting/refining prompts to steer FM toward desired output; techniques include few-shot, role-prompting, chain-of-thought.

Provisioned Throughput (Bedrock)

  • Definition: Purchase of dedicated inference capacity for a model, guaranteeing throughput & latency.
  • Need to Know: Required for production use of custom fine-tuned models.

Recall

  • Definition: Fraction of actual positives correctly identified.
    \text{Recall}=\frac{\text{True Positives}}{\text{True Positives}+\text{False Negatives}}
  • Example: % of sick patients correctly diagnosed.
  • Need to Know: High recall ⇒ low false-negative rate.

Regression

  • Definition: Supervised learning predicting a continuous value.
  • Examples: House-price estimation; quarterly sales forecast.

Reinforcement Learning

  • Definition: Agent learns optimal actions in an environment by maximizing cumulative reward.
  • Example: AWS DeepRacer, AlphaGo.

Responsible AI

  • Definition: Designing & deploying AI that is safe, ethical, fair, transparent, and aligned with human values.
  • Pillars: Fairness, Explainability, Governance, Transparency.

Retrieval-Augmented Generation (RAG)

  • Definition: Framework that retrieves relevant knowledge and prepends it to the LLM prompt, grounding responses.
  • Example: Chatbot pulls latest company news before answering.
  • Need to Know: Reduces hallucinations without changing model weights; Bedrock Knowledge Bases provide managed RAG.

Single-Turn Messaging

  • Definition: Stateless interaction: one user prompt → one model response, no memory.

Supervised Learning

  • Definition: ML where model learns from labeled data; includes classification & regression.

Token

  • Definition: Smallest text unit processed by an LLM (word, sub-word, punctuation). Context window measured in tokens.

Training

  • Definition: Process of teaching a model via data so it learns parameter values.

Transparency

  • Definition: Responsible-AI principle of disclosing system purpose, workings, limitations, and data.
  • Example: Titan Image Generator adds invisible watermark indicating AI origin.

Unlabeled Data

  • Definition: Data with no annotations; used in unsupervised learning or domain-adaptation.
  • Example: Huge collection of untagged web text.

Unsupervised Learning

  • Definition: ML where model finds patterns/structure in unlabeled data.
  • Primary Example: Clustering.

Validation Data

  • Definition: Held-out subset of the original data used during training to tune hyperparameters & monitor overfitting.