LLM Agents

Generative Model Agents

  • Introduction to water and drinking water agent aids.
  • Key components of larger water agents: planning, memory, and tool usage.
  • Applications of larger water agents.

OpenAI Updates

  • Free flow is 80-87% cheaper than GPT-3.
  • Free flow's performance is stronger than GPT-3.
  • Competitive landscape: OpenAI, Gemini, and Grogu.
  • OpenAI currently has the strongest model.

Reinforce Pre-Training Paper (June 9th)

  • Changes to traditional next token prediction pre-training.
  • Standard approach: learning the likelihood of the next token.
  • New approach: learning the whole reasoning process (train of thought).
  • Predicting the next token based on reasoning, not just likelihood.
  • Reward for correct prediction via reinforcement learning.
  • For every token, the model generates several choices or prediction paths.
  • The correct prediction path gets a higher reward, incorrect ones get lower rewards.
  • Retraining based on rewards for predicting the correct next token.
  • Method shown to be more powerful than baselines.

AI Agents

  • AI agent behaves on behalf of human beings.
  • Interacts with the environment and takes actions.
  • A system that can perceive the environment, make decisions, and take actions.
  • Perceives from the environment and takes actions accordingly.
  • Decisions based on the underlying AI backbone model (e.g., large language models).
  • Tasks are sent to the AI agent, which interacts with tools and environments to complete them.
  • Components:
    • Environment
    • Perception
    • Action-taking
    • Planning
    • User and tool interaction
    • Memory
    • Cooperation with other agents (multi-agent system).
  • Example: Agents in a sandbox village perceiving and acting toward goals.

Embodied AI

  • Agent can be physical or symbolic.
  • Perceives through sensors and acts through actuators.
  • Transforms industries and improves lives.
  • Interacts with an environment, learns from it, and takes actions.
  • Usually has a world model--understanding of the environment.

Large Language Model Agents

  • AI agent based on large language models.
  • Key features:
    • Reasoning power
    • Multi-step planning (chain of thought)
    • Reflection (learning from mistakes)
    • Memory (data, signals, short-term/long-term memory)
    • Tool usage (APIs, internet search, etc.)
    • Interaction with users, perceptions (text, images, video, audio), reasoning, planning steps, tool usage, and returning answers.

Lillian Wynn

  • Explanation of logic model agents - OpenAI's previous vice president.

AI Agent Tools

  • Calendars, calculators, coding interpreters, web searches.
  • Long and short-term memory.
  • Reasoning and planning to decompose tasks.
  • Reflections, critiques, training, subgoal decomposition.
  • Actions: tool-based or non-tool-based (e.g., textual output).

Key Component 1: Planning

  • Task Decomposition: Breaking down complex tasks into subgoals.
    • Chain of thought, Tree of thought
  • Reflection and Refinement: Self-criticism and learning from mistakes.

Key Component 2: Memory

  • Short-term memory: Contextual information within the current time window.
  • Long-term memory: Stored data/information externally in a database (user preferences, chat history).

Key Component 3: Tool Usage

  • Using external sources/systems for information not in model weights.
    • Web search, code executors, calculators
  • Needed for real-time information.

Larger Model Agents Map

  • Planning + Memory + Tool Usage = More Power

Planning In Detail

  • Solving a task sequentially (one step at a time).

  • Task decomposition is fundamental.

  • Chain of Thought: Question, Reasoning, Multi-step Reasoning, Answer

  • Tree of Thought: Generate many intermediate steps and select a good path.

  • Large Language Model + P (Planning)

    • Planning Domain Definition Language (PDL):
      • Formal language to define problems and domains of planning.
      • Domain File: Defines available objects, actions, preconditions, and effects.
      • Problem File: Specifies objects, initial state, and target goal state.
    • Planner computes a valid plan (sequence of actions).
    • Neural Symbolic Architecture: Neural (backbone logic models) + Symbolic (reasoning).
  • Traditional Approach: Problem + Domain = Plan

    • In-context learning; Example problems and solutions.
  • PDL Approach: Two-turn reasoning process

    • LLM + PDL.
    • Example problems + example PDL + current problem = new problem PDL.
  • Formal PDL plan interpreted by LLM into natural language.

  • Model plus P: LLM + PDL; planner fed to LLMs in a few-shot way.

Self-Reflection:

  • Reactor combines water usage with a self-reflection process.
  • Thought, observation, and action.
  • Uses observations from tools to refine future reasoning.
  • Train of Hindsight: Supervises fine-tuning.
  • Training data includes past outputs and human feedback.
  • Sequentially evolves from bad to good answers.
  • Data formalized with answer feedback, answer feedback, etc.
  • Learning incrementally how answers evolve from bad to good.
  • Either models can generate all answers at once, bad to good, or multi turn dialog can gradually refine the answers step by step.

Memory Components

  • Sensory memory: Input signals from the environment (visual, auditory, tactile).
  • Short-term (working) memory: Contextual information (in-context examples).
  • Long-term memory: Explicit (live events, facts, concepts) or Implicit (external stored data).

Memory Types in Detail

  • Sensory memories: Embeddings in vector space of raw inputs (images, text).
  • Short-term memory: Context within a window (in-context learning).
  • Long-term memory: External vector stores (database, chat history).

Sensory Input:

  • Visual
  • Auditory
  • Tacile

Tool Usage

  • Program Edit Language Models (PAL):
    • Solving math problems by writing and running Python programs.
  • Tragedy plugins: Automatically decide which tools to use.
  • VAT: Decomposes tasks into modest dabs, each with a program.
  • Reactor (tool usage + self-reflection).
  • Hugging Face Models:
    • Provides a list of model options, and the model learns which one to use.
    • Task Planning: Analyze user request, break down into tasks, select a model.
    • Execution: Distribute tasks to expert ML models.
    • Unification: Combine results and return to users. (Hugging GPT)

Learning-Based Tool Usage

  • Normal to M: Fine-tune with handcrafted examples.
  • Generate more data based on fine-tuned models.
  • Recursively perform multi-turn self-play iterations.
  • Single round of fine-tuning based on generated data (tool format).

Applications of Drinking Water Mono Agents

  • Types: Single-agent, Multi-agent systems, Agent-human cooperation.
  • Single-agent: AI assistant accomplishing everything alone (chatbots).
  • Multi-agent systems: Agents collaborate, compete, or debate.
  • Human-agent cooperation: AI agents play different roles (program managers, engineers), with human beings supervising.

Multi-Agent Collaborative Frameworks

  • Humans more involved: supervise, provide feedback, refine the plan.
  • Examples:
    • GPT framework. The larger models decide what to do repeatedly while feeding the results of the actions back to the prompt.
    • Larger model agent playing Mycroft, perceiving the goal of building skill trees; Receives input from the game environment, write programs to define the behaviors of the agents, and develop skills based on feedback.

Model-Driven Robots / Autonomous Vehicles

  • Users input instructions; the underlying larger models process the information and take embodied actions.
  • Track traffic lights, crosswalks, four-way intersections.
  • Take commands from human beings, process them, and generate code to control the vehicle.
  • Analyze images and understand traffic lines.

Envisioned Society of Large Language Models

  • All participants are larger language model agents playing different roles dealing with different tasks and working together.
  • Users can participate in any of those stages for social activities by selecting and using tools.

Additional Notes

  • AlphaStar: AI that plays StarCraft II.
  • Measures actions per minute (APM); up to 23,000 or 24,000.