AI Agents & Reinforcement Learning – Lecture 12

0.0(0)
studied byStudied by 0 people
learnLearn
examPractice Test
spaced repetitionSpaced Repetition
heart puzzleMatch
flashcardsFlashcards
Card Sorting

1/49

flashcard set

Earn XP

Description and Tags

Vocabulary flashcards covering key terms, methods, frameworks, architectures, and safety concepts from Lecture 12 on Reinforcement Learning and AI Agents.

Study Analytics
Name
Mastery
Learn
Test
Matching
Spaced

No study sessions yet.

50 Terms

1
New cards

Reinforcement Learning (RL)

Machine-learning paradigm where an agent learns to make sequential decisions by maximizing cumulative reward through trial and error.

2
New cards

Agent

Autonomous software entity that can perceive, decide, reason, and act within an environment to achieve goals.

3
New cards

Blackjack Policy

Mapping from game states (player sum, dealer card, usable ace) to actions (hit or stick) that guides play strategy.

4
New cards

First-Visit Monte Carlo (MC)

RL method that updates value estimates using the return from the first time a state is visited in an episode.

5
New cards

Exploration/Exploitation Trade-off

Balancing trying new actions to gather information (explore) and using known rewarding actions (exploit).

6
New cards

Zero-Shot Learning

Ability of a model or agent to perform a task without task-specific training, relying on prior knowledge.

7
New cards

Large Language Model (LLM)

Transformer-based neural network trained on massive text corpora capable of generating and understanding language.

8
New cards

Intelligent Agent

LLM-powered or otherwise advanced agent that leverages background knowledge for zero-shot task solving.

9
New cards

ReAct Agent

Agent architecture combining reasoning (Chain-of-Thought) and action steps, enabling tool use and interaction.

10
New cards

Bellman Equation

Fundamental recursive relationship in dynamic programming expressing the value of a state as immediate reward plus discounted future value.

11
New cards

Temporal Difference (TD) Learning

Family of RL methods (e.g., TD(0), Sarsa) that update value estimates using bootstrapped predictions from subsequent states.

12
New cards

Q-Learning

Off-policy TD algorithm that learns the optimal action-value function by minimizing the Bellman error.

13
New cards

Deep Q-Network (DQN)

Neural-network implementation of Q-learning that uses experience replay and target networks to play Atari games.

14
New cards

AlphaGo

DeepMind system that combined deep neural networks and tree search to defeat professional Go players (2016).

15
New cards

AlphaZero

Generalized version of AlphaGo that mastered Go, Chess, and Shogi from self-play without human data.

16
New cards

AlphaStar

Multi-agent RL system reaching grandmaster level in StarCraft II by coordinating multiple specialized agents.

17
New cards

AlphaFold

DeepMind model that predicts 3D protein structures with near-experimental accuracy using deep learning.

18
New cards

WebAgent

AI agent designed to navigate and interact with websites through browser actions like clicking and typing.

19
New cards

WebLINX

Framework enabling multi-turn dialogue agents to navigate real-world websites (2023).

20
New cards

WebVoyager

Tencent end-to-end multimodal web agent capable of planning and executing browsing tasks (2024).

21
New cards

Model Context Protocol (MCP)

Anthropic proposal for a client-server protocol that standardizes tool, resource, and prompt access for agents.

22
New cards

LangChain

2022 framework for chaining LLM calls, memory, and tools to build AI agents.

23
New cards

AutoGPT

2023 open-source project that automates multi-step tasks via iterative self-prompting and feedback loops.

24
New cards

AutoGen

Framework supporting multi-agent collaboration with optional human-in-the-loop interactions (2023).

25
New cards

Crew.ai

Toolkit that coordinates multiple specialized agents while allowing human oversight.

26
New cards

LangGraph

Graph-based extension of LangChain for stateful, multi-agent workflows.

27
New cards

TapeAgents

2024 approach enabling complex agents with prompt tuning and distillation techniques.

28
New cards

LlamaIndex

Event-driven framework for state management, retrieval, and cycles in agent architectures.

29
New cards

Reactive Agent

Simple stimulus-response agent with no internal state; acts immediately on environmental inputs.

30
New cards

Deliberative Agent

Goal-oriented agent that reasons using explicit beliefs, goals, and plans to find complex solutions.

31
New cards

Hybrid Agent

Architecture combining multiple AI methods or sub-agents, often mixing rule-based and learning components.

32
New cards

Learning Agent

Agent that improves performance over time by collecting data and updating its knowledge base via algorithms like RL.

33
New cards

Retrieval-Augmented Generation (RAG)

Technique where an LLM retrieves relevant documents and conditions its generation on them for up-to-date knowledge.

34
New cards

API Agent

Agent that perceives via API responses and acts through API calls, offering lower latency and risk than UI automation.

35
New cards

Tool Use

Capability of an agent to invoke external functions (e.g., calculator, web search) during reasoning.

36
New cards

Agentic Programming

Software engineering paradigm focused on orchestrating, coordinating, and maintaining networks of AI agents.

37
New cards

Orchestrator

Supervisory component that decomposes tasks, assigns subtasks to specialist agents, and integrates results.

38
New cards

Planning (in agents)

Process by which an agent devises a sequence of actions or subgoals to achieve an objective.

39
New cards

Memory (Short-term / Long-term)

Stores for temporary context (short-term) and persistent knowledge (long-term) used by agents to inform actions.

40
New cards

AI Safety

Discipline concerned with preventing AI systems from causing harm to the external environment.

41
New cards

AI Security

Field focused on protecting AI systems themselves from attacks, exploitation, or misuse.

42
New cards

Adversarial Agent

Malicious or compromised agent that seeks to disrupt, deceive, or harm other systems or users.

43
New cards

AgentPoison

NeurIPS 2024 method for poisoning an agent’s memory/knowledge base to induce harmful behaviors.

44
New cards

Autonomy (in agents)

Ability to operate without continuous human guidance, pursuing goals independently.

45
New cards

Proactiveness

Agent trait of initiating actions or plans in anticipation of future needs.

46
New cards

Reactivity

Capacity to adapt behavior in real time in response to environmental changes.

47
New cards

Social Ability

Skill of interacting with humans or other agents through negotiation, collaboration, and natural language.

48
New cards

Zero-Shot Chain-of-Thought (CoT)

Prompting technique that elicits reasoning steps from an LLM without exemplars, improving performance.

49
New cards

Self-Consistency

Inference strategy where multiple reasoning paths are sampled and majority voting determines the final answer.

50
New cards

Compound AI Systems

Interconnected networks of models, tools, and agents that work together to perform complex tasks beyond single-model capabilities.