1/49
Vocabulary flashcards covering key terms, methods, frameworks, architectures, and safety concepts from Lecture 12 on Reinforcement Learning and AI Agents.
Name | Mastery | Learn | Test | Matching | Spaced |
---|
No study sessions yet.
Reinforcement Learning (RL)
Machine-learning paradigm where an agent learns to make sequential decisions by maximizing cumulative reward through trial and error.
Agent
Autonomous software entity that can perceive, decide, reason, and act within an environment to achieve goals.
Blackjack Policy
Mapping from game states (player sum, dealer card, usable ace) to actions (hit or stick) that guides play strategy.
First-Visit Monte Carlo (MC)
RL method that updates value estimates using the return from the first time a state is visited in an episode.
Exploration/Exploitation Trade-off
Balancing trying new actions to gather information (explore) and using known rewarding actions (exploit).
Zero-Shot Learning
Ability of a model or agent to perform a task without task-specific training, relying on prior knowledge.
Large Language Model (LLM)
Transformer-based neural network trained on massive text corpora capable of generating and understanding language.
Intelligent Agent
LLM-powered or otherwise advanced agent that leverages background knowledge for zero-shot task solving.
ReAct Agent
Agent architecture combining reasoning (Chain-of-Thought) and action steps, enabling tool use and interaction.
Bellman Equation
Fundamental recursive relationship in dynamic programming expressing the value of a state as immediate reward plus discounted future value.
Temporal Difference (TD) Learning
Family of RL methods (e.g., TD(0), Sarsa) that update value estimates using bootstrapped predictions from subsequent states.
Q-Learning
Off-policy TD algorithm that learns the optimal action-value function by minimizing the Bellman error.
Deep Q-Network (DQN)
Neural-network implementation of Q-learning that uses experience replay and target networks to play Atari games.
AlphaGo
DeepMind system that combined deep neural networks and tree search to defeat professional Go players (2016).
AlphaZero
Generalized version of AlphaGo that mastered Go, Chess, and Shogi from self-play without human data.
AlphaStar
Multi-agent RL system reaching grandmaster level in StarCraft II by coordinating multiple specialized agents.
AlphaFold
DeepMind model that predicts 3D protein structures with near-experimental accuracy using deep learning.
WebAgent
AI agent designed to navigate and interact with websites through browser actions like clicking and typing.
WebLINX
Framework enabling multi-turn dialogue agents to navigate real-world websites (2023).
WebVoyager
Tencent end-to-end multimodal web agent capable of planning and executing browsing tasks (2024).
Model Context Protocol (MCP)
Anthropic proposal for a client-server protocol that standardizes tool, resource, and prompt access for agents.
LangChain
2022 framework for chaining LLM calls, memory, and tools to build AI agents.
AutoGPT
2023 open-source project that automates multi-step tasks via iterative self-prompting and feedback loops.
AutoGen
Framework supporting multi-agent collaboration with optional human-in-the-loop interactions (2023).
Crew.ai
Toolkit that coordinates multiple specialized agents while allowing human oversight.
LangGraph
Graph-based extension of LangChain for stateful, multi-agent workflows.
TapeAgents
2024 approach enabling complex agents with prompt tuning and distillation techniques.
LlamaIndex
Event-driven framework for state management, retrieval, and cycles in agent architectures.
Reactive Agent
Simple stimulus-response agent with no internal state; acts immediately on environmental inputs.
Deliberative Agent
Goal-oriented agent that reasons using explicit beliefs, goals, and plans to find complex solutions.
Hybrid Agent
Architecture combining multiple AI methods or sub-agents, often mixing rule-based and learning components.
Learning Agent
Agent that improves performance over time by collecting data and updating its knowledge base via algorithms like RL.
Retrieval-Augmented Generation (RAG)
Technique where an LLM retrieves relevant documents and conditions its generation on them for up-to-date knowledge.
API Agent
Agent that perceives via API responses and acts through API calls, offering lower latency and risk than UI automation.
Tool Use
Capability of an agent to invoke external functions (e.g., calculator, web search) during reasoning.
Agentic Programming
Software engineering paradigm focused on orchestrating, coordinating, and maintaining networks of AI agents.
Orchestrator
Supervisory component that decomposes tasks, assigns subtasks to specialist agents, and integrates results.
Planning (in agents)
Process by which an agent devises a sequence of actions or subgoals to achieve an objective.
Memory (Short-term / Long-term)
Stores for temporary context (short-term) and persistent knowledge (long-term) used by agents to inform actions.
AI Safety
Discipline concerned with preventing AI systems from causing harm to the external environment.
AI Security
Field focused on protecting AI systems themselves from attacks, exploitation, or misuse.
Adversarial Agent
Malicious or compromised agent that seeks to disrupt, deceive, or harm other systems or users.
AgentPoison
NeurIPS 2024 method for poisoning an agent’s memory/knowledge base to induce harmful behaviors.
Autonomy (in agents)
Ability to operate without continuous human guidance, pursuing goals independently.
Proactiveness
Agent trait of initiating actions or plans in anticipation of future needs.
Reactivity
Capacity to adapt behavior in real time in response to environmental changes.
Social Ability
Skill of interacting with humans or other agents through negotiation, collaboration, and natural language.
Zero-Shot Chain-of-Thought (CoT)
Prompting technique that elicits reasoning steps from an LLM without exemplars, improving performance.
Self-Consistency
Inference strategy where multiple reasoning paths are sampled and majority voting determines the final answer.
Compound AI Systems
Interconnected networks of models, tools, and agents that work together to perform complex tasks beyond single-model capabilities.