Autonomous Agents & the Next Evolution of AI
Background & Initial Skepticism
Speaker reflects on the end of his master’s degree in AI, approx. years ago.
Had worked on machine-learning, genetic algorithms and early generative AI.
Felt that creating “true” intelligence still seemed distant.
Observation from practice:
AI/ML excelled as narrow specialists (diagnosing illnesses, fraud detection, traffic optimisation, etc.).
Lacked human-like generalisation across tasks.
Therefore, the idea of full work automation felt "far-fetched" at the time—an assumption later proven wrong.
Breakthrough of GPT-3 & Emergent Capabilities
Merely years later, OpenAI released GPT-—the forerunner of ChatGPT.
Massive experiment: train on “all the data we can find” (books, articles, research papers) using the most powerful computers.
Observable emergent signs of intelligence without task-specific programming:
Natural writing & conversation across vast domains.
Code reading/writing.
Multi-format creativity: articles, songs, poems.
Demonstrates reasoning & pattern recognition resembling humans.
Key milestone: AI shifts from specialist to a more generalist tool.
Practical “Tip-of-the-Iceberg” vs. Full Potential
Common current uses: brainstorming, drafting content, light editing, Q&A.
Claim: these uses barely scratch the surface of generative AI’s capability.
Current Limitations of LLMs
Hallucination: can fabricate facts.
Knowledge staleness: training cut-off means info isn’t always up-to-date.
Struggles with basic maths & multi-tasking.
Mirrors human imperfection—humans also err, yet still accomplish goals.
What Constitutes Human-Level Problem Solving?
Intelligence extends beyond knowledge:
Planning ahead & decomposing problems.
Reflecting on outcomes (feedback loops).
Leveraging external tools.
Emergence of Autonomous Agents
Paradigm shift: view LLM not as chatbots needing continuous human prompts but as autonomous agents.
Definition: systems that automate entire workflows end-to-end with minimal/no human intervention.
Plan tasks, reflect on progress, and use tools—analogous to human workflow.
Human analogy:
We employ phones/computers + diverse software (web browsers, Excel, IDEs) as tools.
Agents invert the control: we specify goal in natural language; the AI selects & operates tools autonomously.
Illustrative Use-Cases of Agents
Web-site creation: entrepreneur describes business & design → AI writes code, deploys site in seconds (replacing need for a human web developer).
Business analytics: instead of hiring an analyst, AI ingests data, runs analyses, answers queries.
Travel planning: AI searches flights, books hotels, schedules activities.
Conceptual label: "digital labour" capable of browsing web, navigating files, using apps, even controlling devices.
Under-the-Hood: Why Agents Are Feasible
GUI elements are merely visual layers; underneath everything is executable code.
Example CLI calls:
One-line code to query ChatGPT API.
One-line code to execute a Google search.
Code snippet to create a Word document programmatically.
Because functions are code, we can combine disparate APIs into new composite programs.
Generic agent framework:
Action library – coded functions (search, file IO, email send, etc.).
Language model – reasoning/planning brain.
Loop – repeatedly:
Prompt LLM: “Given user request & current state, what action next?”
LLM outputs structured command (code parameters).
System executes action → returns result to LLM.
Continue until termination criteria met (goal achieved).
Example walk-through: “Book me the cheapest flight to London.”
LLM decides first action: search flights.
Generates correctly formatted API call.
Receives price list → decides next action (e.g., select + purchase) → iterates.
Existing Commercial Implementations
Microsoft Copilot (Excel): natural-language data analysis without mastering formulas.
Shopify Sidekick: conversational website builder.
HyperWrite: personal assistant that books flights, orders food, manages email.
ChatGPT “GPTs” marketplace: catalog of specialised autonomous agents.
Trend Drivers & Accessibility
Cost of LLM inference decreasing annually; approaching “virtually free.”
High accessibility: anyone with basic programming skills can build an agent.
Prediction: proliferation of internal business agents & agent-powered products/services.
Paradigm Shift in Human-Computer Interaction
Historical analogy:
CLI → GUI revolutionised usability.
Next step: AI-assisted interface (Jarvis-like) integrating conversation + autonomous execution.
Anticipated societal change:
Technical skills once unique to humans now outsourced to AI.
Even data scientists fear replacement, yet see empowerment.
Ethical & Philosophical Implications
Democratisation of capability: lowers barriers to innovation; broadens participation beyond large corporations & specialists.
Collaboration, not replacement:
Jarvis doesn’t replace Tony Stark; AI augments human creativity, ingenuity & experience.
Humans focus on “bigger picture” while AI handles execution.
Key Takeaways & Significance
LLMs have evolved from narrow specialists to versatile reasoning engines.
Autonomous agents extend this power, executing complex multi-step tasks via tool use.
Technical feasibility rests on code-based actions + LLM planning loops.
Real-world deployments already exist and are expanding as costs drop.
Future computing may revolve around conversational delegation rather than direct manipulation, reshaping job roles and innovation dynamics.
Background & Initial Skepticism
Speaker reflects on the conclusion of his master’s degree in AI, approximately years ago, a period when the field was rapidly evolving but still far from its current state.
During this time, he was actively involved in machine-learning, working with statistical models and algorithms, genetic algorithms which mimic natural selection for optimization, and early forms of generative AI, often rule-based or using simpler neural networks for content creation.
Despite these advancements, the creation of “true”, general-purpose intelligence, capable of complex reasoning and adaptability like humans, still felt quite distant and theoretical.
Observation from practical application and research:
AI and machine learning models showed exceptional prowess as narrow specialists. Examples include highly accurate diagnosis of specific illnesses from medical images, real-time fraud detection in financial transactions, or complex traffic flow optimization in urban planning, where they outperformed humans in specific, well-defined tasks.
However, these systems critically lacked human-like generalization. They struggled to transfer knowledge or skills learned in one domain to an entirely different, unrelated task without extensive re-training or explicit programming for that new task. This contrast highlighted the limitations of even advanced AI at the time.
Consequently, the widespread idea of full work automation—where AI systems could autonomously handle diverse, complex job roles—seemed