How We Engineer AI-Assisted Development
Four phases. Every ticket. The discipline that holds when AI is in the loop and the codebase is not yours.
Find your symptom
Pick the row that sounds like your team. The section it points to is the one most directly useful for the symptom, though the surrounding sections explain why.
"My team ships fast in the wrong direction."
The issue is upstream of the code. The work is comprehension, of both the codebase and the requirement.
"Tests pass, but production still breaks."
The verification bar is too narrow. One lens of five gets all the attention.
"We rolled out an AI workflow and it evaporated in week two."
The discipline lived only in people's heads. Sprint pressure displaced it.
"Six months in, nobody can explain how a change was made."
The team has artefacts but no thread connecting them.
See yourself in any of these? Talk to one of our engineering leads.
No diagnosis pitch. A working conversation about your team's AI-assisted delivery practice.
Two failure modes AI makes worse
AI coding agents do not just accelerate the right direction. They accelerate every direction. That makes two failure modes more expensive than they used to be.
Fast wrong direction
A team strong on implementation and weak on comprehension ships at three times the pace into the wrong feature. The faster the engine, the longer the wrong road before someone notices.
Confidently broken
AI's surface confidence is its most dangerous trait. A well-formatted bug looks identical to a well-formatted fix. Continuous integration cannot tell them apart.
The services-team asymmetry. The team writing the code is not the team paying for the bug. Six months from now, a client engineer reads the PR and the audit trail, and either the chain holds or it does not. Process is the contract that makes that asymmetry safe.
Software engineering did not change. The consequence of skipping a phase did.
Understand, Plan, Implement, Verify
Four phases. Every task an engineer picks up runs through them, at the depth the task warrants.
The phases iterate. A real feature runs the cycle two or three times, spike, build, harden.
- A 30-minute bug fix gets two minutes of understand. A two-week migration gets two days. The ratio scales to the task. The four phases stay.
- Each loop is its own complete cycle. Spike, build, harden are not three increments of one cycle; they are three full cycles.
The four-phase shape is consistent with how leading practitioners describe disciplined AI-assisted development, including Anthropic's published guidance on Claude Code. Where our extension applies is the conditions services teams operate in, which the rest of this page describes.
Understanding before changing
The phase most often skipped, and the most expensive to skip with AI in the loop.
Most engineering bugs are misunderstanding bugs in disguise. The fix that was right for the wrong file. The refactor that broke an invariant nobody documented. The feature that worked for the use case the engineer imagined. The patch that solved the symptom, not the cause. When AI is in the loop, these mistakes get faster, not rarer.
Codebase comprehension
Reading the architecture, conventions, invariants, and call graph of the path you are about to change. On brownfield systems this work is heavier than published AI-assisted development advice tends to assume, because the codebase carries decisions the original engineers made and did not document.
Requirement interrogation
Examining ambiguities, hidden assumptions, and missing constraints in what was asked. A requirement is a hypothesis about what someone wants. Until you interrogate it, you do not know which hypothesis you are building.
Asking the AI to write the change before asking it to read the system. What that skips: the existing implementation three folders away that you would have used. What it produces: a parallel implementation that looks fine in isolation and rots the codebase in aggregate. The discipline is comprehension first, change second. Always.
Practice observation. In the first weeks of a typical engagement, our teams routinely find existing implementations of logic the original team believed was centralized. Comprehension before change is the discipline that prevents adding to that drift.
For depth on codebase archaeology and brownfield comprehension patterns, the brownfield blog cluster covers individual anti-patterns and recovery strategies (publishing in stages). For how this connects to modernization engagements where AI capabilities are part of the build, see AI-Driven Application Modernization.
One identity from requirement to merge
A delivery team cannot hold the requirement, the plan, the change, and the verification in one head. The artefact thread does that work.
A solo developer can hold every step of a feature in their own working memory. A delivery team cannot. Engineers rotate across engagements. Two people may touch the same area in the same week. The client carries the long-term consequence of every decision.
The artefact thread. Every task carries one identity from requirement to merge. A single Trace ID appears in the branch name, every commit, the PR title, and a dedicated folder containing the comprehension memo, the plan, the implementation log, and the verification report.
├─ TRACE-PROJ-1842-requirement.md
├─ TRACE-PROJ-1842-memo.md (Understand)
├─ TRACE-PROJ-1842-plan.md (Plan)
├─ TRACE-PROJ-1842-risks.md (Plan)
├─ TRACE-PROJ-1842-iteration-log.md (Implement)
└─ TRACE-PROJ-1842-verification.md (Verify)
Branch: feat/TRACE-PROJ-1842-add-idempotency
PR title: TRACE-PROJ-1842: idempotency keys for POST /api/orders
Why this is not bureaucracy. The trace folder is a commitment device. It makes the consequence of skipping legible at the moment an engineer is tempted to skip. Without it, the consequence is invisible until week six when a regression surfaces and someone has to reconstruct what was understood, planned, built, and verified from PR comments and git archaeology.
Calibration by risk. Not every task gets the full thread. The Trace ID is always present. Depth scales to risk.
| Task type | Trace depth |
|---|---|
| Production incident, auth, payments, data integrity | Full thread, all five artefacts |
| New feature touching multiple files | Full thread |
| Routine refactor with passing tests | Trace ID plus one-paragraph memo |
| Single-file bug fix | Trace ID plus verification, often the test diff |
| Typo fix, dependency bump | Trace ID in commit only |
Tests passing is one lens of five
Most teams treat verification as a checkbox at the end of implementation. Run the tests, check CI, merge. That is one lens. There are at least four others, and on client systems all five carry weight.
− 67 lines
Why AI changes the verification calculus. Each lens is a different role. AI plays roles cheaply. Each lens needs to read the same diff with different attention, and AI scales attention. Adding one more lens means one more chat. The bar for "verified" should move up accordingly.
The independence problem. Five lenses run in the same loaded context produce five reads with the same blind spots. The discipline is splitting the lenses across contexts that do not share memory. Same diff, different attention, different reading positions.
For depth on each lens and the independent-context pattern, the verification blog cluster covers each lens individually (publishing in stages). For how this connects to our agentic AI delivery work, where verification load is highest, see Agentic AI Development.
The discipline lives in the repository, not in the people
Discipline that lives only in people's heads expires under sprint pressure. The version that survives is the one that has been moved out of memory and into the repository.
Sprint pressure produces predictable rationalizations: "quick fix, skip the tests", "skip the memo, it's small", "plan is overkill for this one." Each is a real ask in a real moment. The version of the discipline that holds is the one already encoded in the repository before the pressure arrives.
None of these in isolation is novel. The closed-loop combination is what makes the discipline survive engineer rotation, sprint pressure, and the engineers' own honest rationalizations.
Engagement leads at our long-standing clients have remarked on a shift in the way our team works since we adopted this discipline. Comprehension memos arrive before the first PR. PRs come with audit trails their internal teams can follow without a handover meeting. Regressions, when they do surface, surface as known-and-anticipated rather than as mystery investigations. The pattern is most visible in engagements where the consequence of an unexplained change is highest, on systems carrying audit, compliance, or data-integrity requirements. See examples in our case studies.
How this adapts on engagement. The shape adjusts to the client's existing process. Some teams already have most of the infrastructure and we extend it. Some adopt it from scratch over a few weeks. The four phases and the trace thread are constant.
Ways to engage
A structured diagnosis with three depth formats, plus a forthcoming field guide for self-paced reading.
Practice diagnosis
A scoped review of your team's AI-assisted delivery practice by our engineering team. Three formats, pick the depth that matches your urgency. Each one produces a written recommendation your team can act on independently, regardless of whether you engage further with us.
Async assessment
Fill in a structured form covering your codebase, team size, sprint cadence, and the symptoms you are seeing. Our engineering team reviews and sends back a one-page diagnosis.
Live working session
Live session with your engineering lead and one or two engineers. We walk through a recent feature end-to-end and identify where the cycle is breaking.
Embedded diagnosis
Our team works alongside yours, reviews several active tickets across the four phases, and produces a written recommendation covering traceability, verification, and enforcement.
Tell us which format suits your team in the form. We respond within one business day.
Every AnAr engagement runs on this process
The four-phase cycle is not a standalone service. It is the delivery discipline built into every engagement model AnAr runs. Each service type puts different phases of the cycle under the most pressure.
AI-Driven Application Modernization
Modernization where shipping AI features is part of the goal. Heavy comprehension on the legacy system, AI integration scoped alongside the architectural work.
Phase emphasis: Understand Explore Offering 02Agentic AI Development
Production agents fail in ways unit tests cannot reach. Five lenses run on every change before merge.
Phase emphasis: Verify Explore Offering 03Product Engineering
Long-running builds across multiple engineers and releases. The trace thread holds the engagement together across rotation.
Phase emphasis: Traceability Explore Offering 04Global Team Solutions
Embedded teams across time zones. The enforcement loop carries context that handover meetings cannot.
Phase emphasis: Enforcement loop ExploreFrequently asked questions
Questions we get regularly from engineering leads and CTOs before a first conversation.
Does this process require switching AI coding tools?
No. The process is tool-agnostic. The four phases and the trace thread work regardless of which AI coding assistant the team uses. The disciplines are behavioral and structural, not product-specific.
We do not name or recommend a specific tool in this process. What matters is the habit of running comprehension before writing code, and verification across five lenses before merge.
Does this work on legacy or brownfield codebases?
It was designed for them. Most published AI-assisted development advice assumes a greenfield codebase. This process puts the Understand phase first specifically because brownfield systems carry decisions that are not documented and invariants that are not tested.
The Understand phase is heavier on older systems. That is by design, not a limitation. Codebase comprehension before any change is the discipline that prevents compounding the drift already present in a legacy system.
How long does it take a team to adopt this discipline?
The structural pieces, agent context files, PR templates, and the CI gate, can be set up in a single sprint. Most teams have the enforcement loop in place within two to three weeks.
The behavioral shift takes longer. Engineers internalizing the four phases as their default, rather than a checklist they comply with, typically takes four to eight weeks in our experience. Teams that start with the enforcement loop already in place get there faster, because the structure makes the discipline the path of least resistance.
What does a practice diagnosis actually deliver?
Each format produces a written recommendation. The async assessment produces a one-page document covering where the team is across each of the four phases and the two or three highest-priority changes we would make. The 90-minute working session walks through a recent feature end-to-end and produces a more detailed written recommendation. The one-week embedded diagnosis produces a comprehensive recommendation covering traceability, verification, and the enforcement loop.
All three are written for an engineering team to act on independently. Whether or not the team engages further with AnAr, the recommendation is actionable.
Can we adopt just one part of the process?
Yes. Some engagements start with verification, adding the five-lens review to an existing team's PR process. Others start with traceability. The full cycle works as a coherent whole, but each discipline also stands on its own.
In practice, teams that start with one piece tend to pull in the others once they see the value. The enforcement loop in particular tends to surface gaps in adjacent phases quickly.
How does this fit alongside an existing sprint process?
The four phases map onto most existing sprint workflows without adding ceremonies. Understand and Plan happen before the first commit. Implement is the existing development work. Verify happens before the PR is submitted for review.
The enforcement loop, the PR template, the CI gate, and the trace folder, wraps your existing workflow rather than replacing it. The artefacts it produces become the audit trail your team already needs for client-facing work.
What does working with an AnAr team look like in practice?
An AnAr delivery team integrates with your existing process. We embed rather than parachute in. The trace discipline, agent context files, and reviewer templates we build are designed to stay in your repository after the engagement ends, owned by your team.
The goal is that the discipline becomes yours, not a dependency on us. The enforcement loop is explicit about this: it lives in the repository, not in the people.
This page is the most common starting point for those conversations. Browse our case studies for examples of the engineering work where this process operates.
