LLM Agents

Changes to traditional next token prediction pre-training.
Standard approach: learning the likelihood of the next token.
New approach: learning the whole reasoning process (train of thought).
Predicting the next token based on reasoning, not just likelihood.
Reward for correct prediction via reinforcement learning.
For every token, the model generates several choices or prediction paths.
The correct prediction path gets a higher reward, incorrect ones get lower rewards.
Retraining based on rewards for predicting the correct next token.
Method shown to be more powerful than baselines.

AI agent behaves on behalf of human beings.
Interacts with the environment and takes actions.
A system that can perceive the environment, make decisions, and take actions.
Perceives from the environment and takes actions accordingly.
Decisions based on the underlying AI backbone model (e.g., large language models).
Tasks are sent to the AI agent, which interacts with tools and environments to complete them.
Components:
- Environment
- Perception
- Action-taking
- Planning
- User and tool interaction
- Memory
- Cooperation with other agents (multi-agent system).
Example: Agents in a sandbox village perceiving and acting toward goals.

Task Decomposition: Breaking down complex tasks into subgoals.
- Chain of thought, Tree of thought
Reflection and Refinement: Self-criticism and learning from mistakes.

Short-term memory: Contextual information within the current time window.
Long-term memory: Stored data/information externally in a database (user preferences, chat history).

Using external sources/systems for information not in model weights.
- Web search, code executors, calculators
Needed for real-time information.

Solving a task sequentially (one step at a time).
Task decomposition is fundamental.
Chain of Thought: Question, Reasoning, Multi-step Reasoning, Answer
Tree of Thought: Generate many intermediate steps and select a good path.
Large Language Model + P (Planning)
- Planning Domain Definition Language (PDL):
  - Formal language to define problems and domains of planning.
  - Domain File: Defines available objects, actions, preconditions, and effects.
  - Problem File: Specifies objects, initial state, and target goal state.
- Planner computes a valid plan (sequence of actions).
- Neural Symbolic Architecture: Neural (backbone logic models) + Symbolic (reasoning).
Traditional Approach: Problem + Domain = Plan
- In-context learning; Example problems and solutions.
PDL Approach: Two-turn reasoning process
- LLM + PDL.
- Example problems + example PDL + current problem = new problem PDL.
Formal PDL plan interpreted by LLM into natural language.
Model plus P: LLM + PDL; planner fed to LLMs in a few-shot way.

Reactor combines water usage with a self-reflection process.
Thought, observation, and action.
Uses observations from tools to refine future reasoning.
Train of Hindsight: Supervises fine-tuning.
Training data includes past outputs and human feedback.
Sequentially evolves from bad to good answers.
Data formalized with answer feedback, answer feedback, etc.
Learning incrementally how answers evolve from bad to good.
Either models can generate all answers at once, bad to good, or multi turn dialog can gradually refine the answers step by step.

Sensory memory: Input signals from the environment (visual, auditory, tactile).
Short-term (working) memory: Contextual information (in-context examples).
Long-term memory: Explicit (live events, facts, concepts) or Implicit (external stored data).

Program Edit Language Models (PAL):
- Solving math problems by writing and running Python programs.
Tragedy plugins: Automatically decide which tools to use.
VAT: Decomposes tasks into modest dabs, each with a program.
Reactor (tool usage + self-reflection).
Hugging Face Models:
- Provides a list of model options, and the model learns which one to use.
- Task Planning: Analyze user request, break down into tasks, select a model.
- Execution: Distribute tasks to expert ML models.
- Unification: Combine results and return to users. (Hugging GPT)

Types: Single-agent, Multi-agent systems, Agent-human cooperation.
Single-agent: AI assistant accomplishing everything alone (chatbots).
Multi-agent systems: Agents collaborate, compete, or debate.
Human-agent cooperation: AI agents play different roles (program managers, engineers), with human beings supervising.

Humans more involved: supervise, provide feedback, refine the plan.
Examples:
- GPT framework. The larger models decide what to do repeatedly while feeding the results of the actions back to the prompt.
- Larger model agent playing Mycroft, perceiving the goal of building skill trees; Receives input from the game environment, write programs to define the behaviors of the agents, and develop skills based on feedback.

Users input instructions; the underlying larger models process the information and take embodied actions.
Track traffic lights, crosswalks, four-way intersections.
Take commands from human beings, process them, and generate code to control the vehicle.
Analyze images and understand traffic lines.

All participants are larger language model agents playing different roles dealing with different tasks and working together.
Users can participate in any of those stages for social activities by selecting and using tools.