LLM Agents
Generative Model Agents
- Introduction to water and drinking water agent aids.
- Key components of larger water agents: planning, memory, and tool usage.
- Applications of larger water agents.
OpenAI Updates
- Free flow is 80-87% cheaper than GPT-3.
- Free flow's performance is stronger than GPT-3.
- Competitive landscape: OpenAI, Gemini, and Grogu.
- OpenAI currently has the strongest model.
Reinforce Pre-Training Paper (June 9th)
- Changes to traditional next token prediction pre-training.
- Standard approach: learning the likelihood of the next token.
- New approach: learning the whole reasoning process (train of thought).
- Predicting the next token based on reasoning, not just likelihood.
- Reward for correct prediction via reinforcement learning.
- For every token, the model generates several choices or prediction paths.
- The correct prediction path gets a higher reward, incorrect ones get lower rewards.
- Retraining based on rewards for predicting the correct next token.
- Method shown to be more powerful than baselines.
AI Agents
- AI agent behaves on behalf of human beings.
- Interacts with the environment and takes actions.
- A system that can perceive the environment, make decisions, and take actions.
- Perceives from the environment and takes actions accordingly.
- Decisions based on the underlying AI backbone model (e.g., large language models).
- Tasks are sent to the AI agent, which interacts with tools and environments to complete them.
- Components:
- Environment
- Perception
- Action-taking
- Planning
- User and tool interaction
- Memory
- Cooperation with other agents (multi-agent system).
- Example: Agents in a sandbox village perceiving and acting toward goals.
Embodied AI
- Agent can be physical or symbolic.
- Perceives through sensors and acts through actuators.
- Transforms industries and improves lives.
- Interacts with an environment, learns from it, and takes actions.
- Usually has a world model--understanding of the environment.
Large Language Model Agents
- AI agent based on large language models.
- Key features:
- Reasoning power
- Multi-step planning (chain of thought)
- Reflection (learning from mistakes)
- Memory (data, signals, short-term/long-term memory)
- Tool usage (APIs, internet search, etc.)
- Interaction with users, perceptions (text, images, video, audio), reasoning, planning steps, tool usage, and returning answers.
Lillian Wynn
- Explanation of logic model agents - OpenAI's previous vice president.
AI Agent Tools
- Calendars, calculators, coding interpreters, web searches.
- Long and short-term memory.
- Reasoning and planning to decompose tasks.
- Reflections, critiques, training, subgoal decomposition.
- Actions: tool-based or non-tool-based (e.g., textual output).
Key Component 1: Planning
- Task Decomposition: Breaking down complex tasks into subgoals.
- Chain of thought, Tree of thought
- Reflection and Refinement: Self-criticism and learning from mistakes.
Key Component 2: Memory
- Short-term memory: Contextual information within the current time window.
- Long-term memory: Stored data/information externally in a database (user preferences, chat history).
Key Component 3: Tool Usage
- Using external sources/systems for information not in model weights.
- Web search, code executors, calculators
- Needed for real-time information.
Larger Model Agents Map
- Planning + Memory + Tool Usage = More Power
Planning In Detail
Solving a task sequentially (one step at a time).
Task decomposition is fundamental.
Chain of Thought: Question, Reasoning, Multi-step Reasoning, Answer
Tree of Thought: Generate many intermediate steps and select a good path.
Large Language Model + P (Planning)
- Planning Domain Definition Language (PDL):
- Formal language to define problems and domains of planning.
- Domain File: Defines available objects, actions, preconditions, and effects.
- Problem File: Specifies objects, initial state, and target goal state.
- Planner computes a valid plan (sequence of actions).
- Neural Symbolic Architecture: Neural (backbone logic models) + Symbolic (reasoning).
- Planning Domain Definition Language (PDL):
Traditional Approach: Problem + Domain = Plan
- In-context learning; Example problems and solutions.
PDL Approach: Two-turn reasoning process
- LLM + PDL.
- Example problems + example PDL + current problem = new problem PDL.
Formal PDL plan interpreted by LLM into natural language.
Model plus P: LLM + PDL; planner fed to LLMs in a few-shot way.
Self-Reflection:
- Reactor combines water usage with a self-reflection process.
- Thought, observation, and action.
- Uses observations from tools to refine future reasoning.
- Train of Hindsight: Supervises fine-tuning.
- Training data includes past outputs and human feedback.
- Sequentially evolves from bad to good answers.
- Data formalized with answer feedback, answer feedback, etc.
- Learning incrementally how answers evolve from bad to good.
- Either models can generate all answers at once, bad to good, or multi turn dialog can gradually refine the answers step by step.
Memory Components
- Sensory memory: Input signals from the environment (visual, auditory, tactile).
- Short-term (working) memory: Contextual information (in-context examples).
- Long-term memory: Explicit (live events, facts, concepts) or Implicit (external stored data).
Memory Types in Detail
- Sensory memories: Embeddings in vector space of raw inputs (images, text).
- Short-term memory: Context within a window (in-context learning).
- Long-term memory: External vector stores (database, chat history).
Sensory Input:
- Visual
- Auditory
- Tacile
Tool Usage
- Program Edit Language Models (PAL):
- Solving math problems by writing and running Python programs.
- Tragedy plugins: Automatically decide which tools to use.
- VAT: Decomposes tasks into modest dabs, each with a program.
- Reactor (tool usage + self-reflection).
- Hugging Face Models:
- Provides a list of model options, and the model learns which one to use.
- Task Planning: Analyze user request, break down into tasks, select a model.
- Execution: Distribute tasks to expert ML models.
- Unification: Combine results and return to users. (Hugging GPT)
Learning-Based Tool Usage
- Normal to M: Fine-tune with handcrafted examples.
- Generate more data based on fine-tuned models.
- Recursively perform multi-turn self-play iterations.
- Single round of fine-tuning based on generated data (tool format).
Applications of Drinking Water Mono Agents
- Types: Single-agent, Multi-agent systems, Agent-human cooperation.
- Single-agent: AI assistant accomplishing everything alone (chatbots).
- Multi-agent systems: Agents collaborate, compete, or debate.
- Human-agent cooperation: AI agents play different roles (program managers, engineers), with human beings supervising.
Multi-Agent Collaborative Frameworks
- Humans more involved: supervise, provide feedback, refine the plan.
- Examples:
- GPT framework. The larger models decide what to do repeatedly while feeding the results of the actions back to the prompt.
- Larger model agent playing Mycroft, perceiving the goal of building skill trees; Receives input from the game environment, write programs to define the behaviors of the agents, and develop skills based on feedback.
Model-Driven Robots / Autonomous Vehicles
- Users input instructions; the underlying larger models process the information and take embodied actions.
- Track traffic lights, crosswalks, four-way intersections.
- Take commands from human beings, process them, and generate code to control the vehicle.
- Analyze images and understand traffic lines.
Envisioned Society of Large Language Models
- All participants are larger language model agents playing different roles dealing with different tasks and working together.
- Users can participate in any of those stages for social activities by selecting and using tools.
Additional Notes
- AlphaStar: AI that plays StarCraft II.
- Measures actions per minute (APM); up to 23,000 or 24,000.