AI in Software Engineering

Al in Software Engineering - Comprehensive Study Notes

Page 1

Introduction

Presenters:
- Cigdem Sengul
- Rumyana Neykova
Affiliation:
- Computer Science Department - Brunel University London

Page 2

Part 1: The Current Landscape

Theme: How AI is reshaping the way software is built

Page 3

AI Coding Tools — Adoption in 2025

Statistics on AI in Developer Workflows:
- 76% of professional developers use AI tools or plan to do so.
- 41% of all code is now generated by AI.
- 82% of developers utilize AI on a weekly or daily basis in their working processes.
- 3.5× average return on investment (ROI) from enterprise AI investments.
Sources:
- Stack Overflow Dev Survey 2024
- EliteBrains 2025
- Microsoft Market Study 2025
Areas of AI Utilization by Developers:
- Code writing: 82%
- Debugging: 68%
- Documentation: 57%
- Code review: 45%

Page 4

The AI Paradox

Observation: AI tools enhance developers' speed but do not correlate with increased software delivery.
Quote: "AI coding assistants are making developers more productive at writing code. But why aren't most enterprises actually delivering more software?" — Bill Staples, CEO of GitLab (GitLab, Feb 2026)
Challenges in Software Delivery:
- Time Allocation:
- Code accounts for only 10–20% of a developer's day.
- Developers spend 80–90% on reviews, security scans, pipeline waits, and compliance checks.
- Tool Sprawl:
- 60% of teams use 5 or more development tools.
- 49% use 5 or more AI tools, leading to fragmented toolchains that diminish time savings from AI.
- Quality Concerns:
- 48% of AI-generated code may harbor security vulnerabilities.
- Review backlogs can increase as output speed rises.
Sources:
- GitLab Global DevSecOps Report 2025
- Faros AI Productivity Paradox Report 2025

Page 5

Signals from the Frontier

Reporting from Leading Companies as of Early 2026

Anthropic:
- 90–95% of Claude Code’s codebase is written by Claude itself.
Spotify:
- Shipped 50+ features in 2025 through AI-driven workflows. Notably, top developers haven’t written code since December.
Microsoft:
- Approximately 30% of code generated by AI, with CEO Satya Nadella attributing major output gains to AI tools.
Google:
- 21% of code classified as AI-assisted, documented as one of the largest enterprise-level AI coding deployments.
Barriers to Adoption:
- Regulatory: Compliance with GDPR and financial/health services regulations.
- Technical: Legacy systems and low test coverage create integration challenges.
- Cultural: Resistance from teams and leadership alongside issues of trust.
- Economic: High costs of enterprise tools prevent many SMEs from accessibility.

Page 6

Part 2: AI Across the Software Development Life Cycle (SDLC)

Phases of SDLC:
- Planning
- Requirements
- Design
- Coding
- Testing
- Deployment
- Maintenance

Page 7

LLM Papers in SDLC

Research State:
- Most mature phase is Development at 56.65% of studies.
- Requirements and Design phases are the least explored at <5%.
- Quality Assurance is emerging at 15.14%.
- Maintenance gaining traction at 22.71%.

Page 8

What LLMs Can and Cannot Do - Evidence-Based Summary

Development Phase:
- Successful Areas:
- Code generation (72% success rate)
- Documentation
- Refactoring
- Code review
- Limitations:
- Struggles with business logic and system-level understanding.
Testing Phase:
- Successful Areas:
- Unit test generation and oracle generation
- Regression tests with a 48% bug detection rate (Tian et al., 2023)
- Limitations:
- Limited effectiveness in domain edge cases and integration testing.
Requirements Phase:
- Successful Areas:
- Classification, generation, translation to templates.
- Effective augmentation tool for story inspiration.
- Limitations:
- Little industrial validation (3.9% of studies).
- Needs human analysts for context clarification.
Design Phase:
- Successful Areas:
- UML generation and specification synthesis yielding 21% improvement (SpecSyn, EASE 2023).
- Limitations:
- Weak understanding of design patterns and architectural reasoning; least explored (0.92%).
Maintenance Phase:
- Successful Areas:
- Automated problem resolution (APR) leads to effective bug reporting (162/337 bugs at $0.42/bug).
- Limitations:
- Risky for system-wide changes and context-sensitive legacy systems.

Page 9

Spotlight: Requirements Engineering with AI

Objective: Automate the understanding of software requirements.
Processes Involved:
- Elicitation: Utilizing conversational agents to interview stakeholders, summarize needs, and detect conflicts.
- Specification: LLMs converting meeting notes/user stories into formal requirements formats (e.g., IEEE 830/EARS).
- Validation: Models flagging ambiguous, incomplete, or contradictory requirements pre-development.
- Traceability: AI linking requirements to code, tests, and documentation for impact analysis and coverage reporting.

Page 10

Which AI Coding Tool?

Claude Code:
- Acts as an agentic terminal, autonomously writing, running, testing, and committing full-codebase features.
Cursor:
- AI-native IDE designed for deep, context-aware edits in a tailored environment.
GitHub Copilot:
- Best for real-time suggestions, chat assistance, and pull request summaries within established enterprises.
Amazon CodeWhisperer:
- Specialized for deep integration with AWS tools and security scanning for cloud-based development.
Tabnine:
- Optimal choice for on-premise requirements in regulated industries ensuring data privacy.

Page 11

Part 3: Prompting & RAG

Purpose: Foundations underpinning functionality when using LLMs.

Page 12

Prompting Strategies

Impact of Model Communication on Output Quality:
- Zero-shot Prompting:
- Example Prompt: "Generate a JUnit test for a function that matches players by skill level."
- Result: Generic test without edge case considerations.
- Few-shot Prompting:
- Example Prompt: "Here are 2 test examples. Generate tests for match_players() in the same style."
- Result: Includes fixture setup, edge case assertions, and consistent naming - higher completeness.
- Chain-of-thought Prompting:
- Example Prompt: "First identify valid inputs and edge cases, then generate JUnit tests for match_players()."
- Result: Systematic reasoning covers varied scenarios, providing the best coverage without examples.
- Role/System Prompt:
- Example Prompt: "You are a senior security engineer reviewing code for vulnerabilities."
- Sets model persona and limitations crucial for agentic contexts.
- Structured Output:
- Prompt: "Respond only in JSON with keys: issue, severity, recommendation."
- This format facilitates downstream parsing, essential for integration with tools.

Page 13

Beyond Prompting: Importance of Prompting

Exploration of whether prompting's quality truly influences outcomes.

Page 14

The Evolution of Technology Stack

Previous State: Prompting through chat interfaces and one-shot generation with human involvement for every task.
Current State: Introduction of Retrieval-Augmented Generation (RAG) and Tool Use.
- RAG allows AI to fetch context before responding, enhancing the accuracy of outputs.
Future Outlook: Agents capable of independent actions, utilizing a Model Context Protocol (MCP) to connect multiple tools seamlessly.

Page 15

Retrieval-Augmented Generation (RAG)

Process: How LLMs Address Questions Using Real-world Knowledge:
1. User Query - Initiates the process.
2. Retrieve Documents - Pulls relevant documents from a database according to the query.
3. Augment Prompt - The retrieved context is modified for the LLM response.
4. Generate Response - The LLM synthesizes context plus query for an answer.
Advantages:
- Reduced hallucination risks, permit real-time knowledge updates, and ensure traceable sources through semantic search.

Page 16

RAG Architecture

Summary of Workflow Steps:
1. Encode Documents into a Vector Database
2. Execute Similarity Searches
3. Generate Queries for LLM
4. Return Responses

Page 17

RAG-Bingo

A participatory element involving various concepts within RAG, prompting, and AI tools.

Page 18

Part 4: Agents & the Future

Focus: Transitioning to autonomous action with multi-step execution.

Page 19

What Does "Agentic AI" Mean?

Definition: Moving from single-turn interactions to autonomous, multi-step task execution.
Traditional LLM Interaction Workflow:
1. User writes prompt.
2. Model generates text.
3. User copies results to run and evaluates outcome.
4. Process repeats manually.
Agentic AI Workflow:
- User specification leads to autonomous planning, tool usage, iteration, and presentation of results.
- Key Insight: The agent dictates the subsequent actions rather than the user.

Page 20

Anatomy of an LLM Agent

Explores the structural components and functionalities of an AI agent's architecture.

Page 21

AI Test Generation Agent Workflow

Input: Function Code
Workflow Steps:
1. Agent Configuration (Including Memory Settings, LLM Parameters, Tool Selections)
2. ReAct Generation Loop: Analyzing function, generating tests, validating results, refining strategies.

Page 22

Model Context Protocol (MCP)

Purpose: Standardizing AI agent interactions with tools and data sources.
Overview:
- MCP serves as an open standard allowing LLMs an interface to various resources including APIs and databases.
Benefits:
- Decouples context provision from model logic, enhances integration flexibility, and promotes controlled actions across various applications.

Page 23

5 Trends to Watch in AI Software Engineering

Full SDLC Automation:
- GitLab Duo aims to automate the entire software lifecycle from issue identification to deployment.
Specification-Driven IDEs:
- Example: Kiro (Amazon AWS) focuses on generating requirements and implementation directly from natural language specifications.
AI as Sub-Agent Teams:
- Claude Cowork orchestrates sub-agents for parallel workflows in a shorter timeframe.
VS Code as Agent Command Center:
- Evolving IDEs coordinating multiple specialized AI agents.
Recursive AI Development:
- AI tools developing additional AI, creating a rapid iteration environment.

Page 24

Part 5: Risks

Exploring the Risks of AI Technologies in Software Engineering.

Page 25

When AI Gets it Wrong

Case Study: A CTO reflects on the challenges of balancing innovative AI development with practical software creation.
Insights: The dilemma of potentially using non-reliable AI approaches versus foundational practices could lead to a trade-off between speed and maintainability.

Page 26

Risks Associated with AI in Software Engineering

Hallucination & Accuracy:
- LLM outputs may contain plausible but incorrect code, highlighting the need for human oversight.
Security & Data Privacy Risks:
- Usage of cloud-based AI may expose proprietary code, necessitating a clear understanding of data retention policies.
Licensing & Copyright Issues:
- Potential legal ambiguities surrounding reproduction of GPL-licensed code by AI tools.
Cost, Dependency, & Environmental Concerns:
- Financial lessons from API pricing and the ecological footprint of AI inference processes should inform decision-making.

Page 27

The Changing Developer Skillset

Skills in Decline:
- Memorizing syntax, writing boilerplate code, and basic unit tests rapidly automated.
Skills on the Rise:
- Crafting precise specifications, critical evaluation of AI outputs, system design, and AI literacy for orchestration.

Page 28

Key Takeaways

AI is widespread yet not transformational at an enterprise level.
Coding remains a small percentage of the entire SDLC, with bottlenecks occurring downstream.
Understanding prompting and Retrieval-Augmented Generation (RAG) as foundational; mastery of agents and MCP is essential for future adaptability.
Shift towards higher abstraction levels, moving from direct coding to conceptual specifications and intents.
Engineers need to cultivate AI literacy encompassing orchestration, evaluative skills, and informed judgment.

Page 29

Try This at Home

Input your project brief into LLMs to identify ambiguous requirements.
Install and interact with Cursor for AI-assisted coding tasks, reflecting on outcomes.
Experiment with function test generation prompts across zeros, few-shot, and chain-of-thought models.
Initiate a security vulnerability review on a piece of code and verify the findings.
Engage in discussion regarding AI productivity metrics and skills maintenance in the context of evolving technology.