Autonomous Agents & the Next Evolution of AI

Background & Initial Skepticism

  • Speaker reflects on the end of his master’s degree in AI, approx. 66 years ago.

    • Had worked on machine-learning, genetic algorithms and early generative AI.

    • Felt that creating “true” intelligence still seemed distant.

  • Observation from practice:

    • AI/ML excelled as narrow specialists (diagnosing illnesses, fraud detection, traffic optimisation, etc.).

    • Lacked human-like generalisation across tasks.

    • Therefore, the idea of full work automation felt "far-fetched" at the time—an assumption later proven wrong.

Breakthrough of GPT-3 & Emergent Capabilities

  • Merely 22 years later, OpenAI released GPT-33—the forerunner of ChatGPT.

    • Massive experiment: train on “all the data we can find” (books, articles, research papers) using the most powerful computers.

  • Observable emergent signs of intelligence without task-specific programming:

    • Natural writing & conversation across vast domains.

    • Code reading/writing.

    • Multi-format creativity: articles, songs, poems.

    • Demonstrates reasoning & pattern recognition resembling humans.

  • Key milestone: AI shifts from specialist to a more generalist tool.

Practical “Tip-of-the-Iceberg” vs. Full Potential

  • Common current uses: brainstorming, drafting content, light editing, Q&A.

  • Claim: these uses barely scratch the surface of generative AI’s capability.

Current Limitations of LLMs

  • Hallucination: can fabricate facts.

  • Knowledge staleness: training cut-off means info isn’t always up-to-date.

  • Struggles with basic maths & multi-tasking.

  • Mirrors human imperfection—humans also err, yet still accomplish goals.

What Constitutes Human-Level Problem Solving?

  • Intelligence extends beyond knowledge:

    • Planning ahead & decomposing problems.

    • Reflecting on outcomes (feedback loops).

    • Leveraging external tools.

Emergence of Autonomous Agents

  • Paradigm shift: view LLM not as chatbots needing continuous human prompts but as autonomous agents.

  • Definition: systems that automate entire workflows end-to-end with minimal/no human intervention.

    • Plan tasks, reflect on progress, and use tools—analogous to human workflow.

  • Human analogy:

    • We employ phones/computers + diverse software (web browsers, Excel, IDEs) as tools.

    • Agents invert the control: we specify goal in natural language; the AI selects & operates tools autonomously.

Illustrative Use-Cases of Agents

  • Web-site creation: entrepreneur describes business & design → AI writes code, deploys site in seconds (replacing need for a human web developer).

  • Business analytics: instead of hiring an analyst, AI ingests data, runs analyses, answers queries.

  • Travel planning: AI searches flights, books hotels, schedules activities.

  • Conceptual label: "digital labour" capable of browsing web, navigating files, using apps, even controlling devices.

Under-the-Hood: Why Agents Are Feasible

  • GUI elements are merely visual layers; underneath everything is executable code.

    • Example CLI calls:

    • One-line code to query ChatGPT API.

    • One-line code to execute a Google search.

    • Code snippet to create a Word document programmatically.

  • Because functions are code, we can combine disparate APIs into new composite programs.

  • Generic agent framework:

    1. Action library – coded functions (search, file IO, email send, etc.).

    2. Language model – reasoning/planning brain.

    3. Loop – repeatedly:

    • Prompt LLM: “Given user request & current state, what action next?”

    • LLM outputs structured command (code parameters).

    • System executes action → returns result to LLM.

    • Continue until termination criteria met (goal achieved).

      • Example walk-through: “Book me the cheapest flight to London.”

    • LLM decides first action: search flights.

    • Generates correctly formatted API call.

    • Receives price list → decides next action (e.g., select + purchase) → iterates.

Existing Commercial Implementations

  • Microsoft Copilot (Excel): natural-language data analysis without mastering formulas.

  • Shopify Sidekick: conversational website builder.

  • HyperWrite: personal assistant that books flights, orders food, manages email.

  • ChatGPT “GPTs” marketplace: catalog of specialised autonomous agents.

Trend Drivers & Accessibility

  • Cost of LLM inference decreasing annually; approaching “virtually free.”

  • High accessibility: anyone with basic programming skills can build an agent.

  • Prediction: proliferation of internal business agents & agent-powered products/services.

Paradigm Shift in Human-Computer Interaction

  • Historical analogy:

    • CLI → GUI revolutionised usability.

    • Next step: AI-assisted interface (Jarvis-like) integrating conversation + autonomous execution.

  • Anticipated societal change:

    • Technical skills once unique to humans now outsourced to AI.

    • Even data scientists fear replacement, yet see empowerment.

Ethical & Philosophical Implications

  • Democratisation of capability: lowers barriers to innovation; broadens participation beyond large corporations & specialists.

  • Collaboration, not replacement:

    • Jarvis doesn’t replace Tony Stark; AI augments human creativity, ingenuity & experience.

    • Humans focus on “bigger picture” while AI handles execution.

Key Takeaways & Significance

  • LLMs have evolved from narrow specialists to versatile reasoning engines.

  • Autonomous agents extend this power, executing complex multi-step tasks via tool use.

  • Technical feasibility rests on code-based actions + LLM planning loops.

  • Real-world deployments already exist and are expanding as costs drop.

  • Future computing may revolve around conversational delegation rather than direct manipulation, reshaping job roles and innovation dynamics.

Background & Initial Skepticism

  • Speaker reflects on the conclusion of his master’s degree in AI, approximately 66 years ago, a period when the field was rapidly evolving but still far from its current state.

    • During this time, he was actively involved in machine-learning, working with statistical models and algorithms, genetic algorithms which mimic natural selection for optimization, and early forms of generative AI, often rule-based or using simpler neural networks for content creation.

    • Despite these advancements, the creation of “true”, general-purpose intelligence, capable of complex reasoning and adaptability like humans, still felt quite distant and theoretical.

  • Observation from practical application and research:

    • AI and machine learning models showed exceptional prowess as narrow specialists. Examples include highly accurate diagnosis of specific illnesses from medical images, real-time fraud detection in financial transactions, or complex traffic flow optimization in urban planning, where they outperformed humans in specific, well-defined tasks.

    • However, these systems critically lacked human-like generalization. They struggled to transfer knowledge or skills learned in one domain to an entirely different, unrelated task without extensive re-training or explicit programming for that new task. This contrast highlighted the limitations of even advanced AI at the time.

    • Consequently, the widespread idea of full work automation—where AI systems could autonomously handle diverse, complex job roles—seemed