Our Process

How We Engineer AI-Assisted Development

Four phases. Every ticket. The discipline that holds when AI is in the loop and the codebase is not yours.

Brownfield-first Built for delivery teams Verification across five lenses

Diagnostic

Find your symptom

Pick the row that sounds like your team. The section it points to is the one most directly useful for the symptom, though the surrounding sections explain why.

"My team ships fast in the wrong direction."

The issue is upstream of the code. The work is comprehension, of both the codebase and the requirement.

"Tests pass, but production still breaks."

The verification bar is too narrow. One lens of five gets all the attention.

"We rolled out an AI workflow and it evaporated in week two."

The discipline lived only in people's heads. Sprint pressure displaced it.

"Six months in, nobody can explain how a change was made."

The team has artefacts but no thread connecting them.

See yourself in any of these? Talk to one of our engineering leads.

No diagnosis pitch. A working conversation about your team's AI-assisted delivery practice.

Why process matters

Two failure modes AI makes worse

AI coding agents do not just accelerate the right direction. They accelerate every direction. That makes two failure modes more expensive than they used to be.

Failure mode 01

Fast wrong direction

A team strong on implementation and weak on comprehension ships at three times the pace into the wrong feature. The faster the engine, the longer the wrong road before someone notices.

Failure mode 02

Confidently broken

AI's surface confidence is its most dangerous trait. A well-formatted bug looks identical to a well-formatted fix. Continuous integration cannot tell them apart.

The services-team asymmetry. The team writing the code is not the team paying for the bug. Six months from now, a client engineer reads the PR and the audit trail, and either the chain holds or it does not. Process is the contract that makes that asymmetry safe.

Software engineering did not change. The consequence of skipping a phase did.

The cycle

Understand, Plan, Implement, Verify

Four phases. Every task an engineer picks up runs through them, at the depth the task warrants.

Understand

What is being asked, and what the existing system already does.

Plan

The simplest path to a working solution, and what could go wrong.

Implement

Getting the agreed change in with the right craftsmanship. The only phase where the failure mode is using AI too little.

Verify

Whether the change did what was wanted, and broke nothing else.

The phases iterate. A real feature runs the cycle two or three times, spike, build, harden.

A 30-minute bug fix gets two minutes of understand. A two-week migration gets two days. The ratio scales to the task. The four phases stay.
Each loop is its own complete cycle. Spike, build, harden are not three increments of one cycle; they are three full cycles.

The four-phase shape is consistent with how leading practitioners describe disciplined AI-assisted development, including Anthropic's published guidance on Claude Code. Where our extension applies is the conditions services teams operate in, which the rest of this page describes.

Phase 1 deep dive

Understanding before changing

The phase most often skipped, and the most expensive to skip with AI in the loop.

Most engineering bugs are misunderstanding bugs in disguise. The fix that was right for the wrong file. The refactor that broke an invariant nobody documented. The feature that worked for the use case the engineer imagined. The patch that solved the symptom, not the cause. When AI is in the loop, these mistakes get faster, not rarer.

Codebase comprehension

Reading the architecture, conventions, invariants, and call graph of the path you are about to change. On brownfield systems this work is heavier than published AI-assisted development advice tends to assume, because the codebase carries decisions the original engineers made and did not document.

Requirement interrogation

Examining ambiguities, hidden assumptions, and missing constraints in what was asked. A requirement is a hypothesis about what someone wants. Until you interrogate it, you do not know which hypothesis you are building.

The anti-pattern

Asking the AI to write the change before asking it to read the system. What that skips: the existing implementation three folders away that you would have used. What it produces: a parallel implementation that looks fine in isolation and rots the codebase in aggregate. The discipline is comprehension first, change second. Always.

Practice observation. In the first weeks of a typical engagement, our teams routinely find existing implementations of logic the original team believed was centralized. Comprehension before change is the discipline that prevents adding to that drift.

For depth on codebase archaeology and brownfield comprehension patterns, the brownfield blog cluster covers individual anti-patterns and recovery strategies (publishing in stages). For how this connects to modernization engagements where AI capabilities are part of the build, see AI-Driven Application Modernization.

Traceability

One identity from requirement to merge

A delivery team cannot hold the requirement, the plan, the change, and the verification in one head. The artefact thread does that work.

A solo developer can hold every step of a feature in their own working memory. A delivery team cannot. Engineers rotate across engagements. Two people may touch the same area in the same week. The client carries the long-term consequence of every decision.

The artefact thread. Every task carries one identity from requirement to merge. A single Trace ID appears in the branch name, every commit, the PR title, and a dedicated folder containing the comprehension memo, the plan, the implementation log, and the verification report.

Example trace folder for one task docs/trace/TRACE-PROJ-1842/
  ├─ TRACE-PROJ-1842-requirement.md
  ├─ TRACE-PROJ-1842-memo.md    (Understand)
  ├─ TRACE-PROJ-1842-plan.md    (Plan)
  ├─ TRACE-PROJ-1842-risks.md   (Plan)
  ├─ TRACE-PROJ-1842-iteration-log.md  (Implement)
  └─ TRACE-PROJ-1842-verification.md  (Verify)

Branch: feat/TRACE-PROJ-1842-add-idempotency
PR title: TRACE-PROJ-1842: idempotency keys for POST /api/orders

Why this is not bureaucracy. The trace folder is a commitment device. It makes the consequence of skipping legible at the moment an engineer is tempted to skip. Without it, the consequence is invisible until week six when a regression surfaces and someone has to reconstruct what was understood, planned, built, and verified from PR comments and git archaeology.

Calibration by risk. Not every task gets the full thread. The Trace ID is always present. Depth scales to risk.

Task type	Trace depth
Production incident, auth, payments, data integrity	Full thread, all five artefacts
New feature touching multiple files	Full thread
Routine refactor with passing tests	Trace ID plus one-paragraph memo
Single-file bug fix	Trace ID plus verification, often the test diff
Typo fix, dependency bump	Trace ID in commit only

A dedicated treatment of traceability, including a sample trace folder skeleton, the artefacts in detail, and the orphan-plan failure mode in full, is forthcoming as a focused subpage.

Verify

Tests passing is one lens of five

Most teams treat verification as a checkbox at the end of implementation. Run the tests, check CI, merge. That is one lens. There are at least four others, and on client systems all five carry weight.

Functional

Regression

Same PR diff

+ 142 lines
− 67 lines

Security

Performance

Maintainability

Functional

Does it do what was asked?

Regression

Did it break what worked?

Security

Did it open new attack surface?

Performance

Does it stay within budget?

Maintainability

Can a future engineer change it safely?

Why AI changes the verification calculus. Each lens is a different role. AI plays roles cheaply. Each lens needs to read the same diff with different attention, and AI scales attention. Adding one more lens means one more chat. The bar for "verified" should move up accordingly.

The independence problem. Five lenses run in the same loaded context produce five reads with the same blind spots. The discipline is splitting the lenses across contexts that do not share memory. Same diff, different attention, different reading positions.

For depth on each lens and the independent-context pattern, the verification blog cluster covers each lens individually (publishing in stages). For how this connects to our agentic AI delivery work, where verification load is highest, see Agentic AI Development.

How we run this

The discipline lives in the repository, not in the people

Discipline that lives only in people's heads expires under sprint pressure. The version that survives is the one that has been moved out of memory and into the repository.

Sprint pressure produces predictable rationalizations: "quick fix, skip the tests", "skip the memo, it's small", "plan is overkill for this one." Each is a real ask in a real moment. The version of the discipline that holds is the one already encoded in the repository before the pressure arrives.

The enforcement loop

Agent context files state the team's bar with teeth. Specific conventions, deletion-tested, no vague adjectives.

PR templates require a populated trace folder. The folder either exists or the PR cannot be reviewed.

A CI gate verifies the chain. The branch name, every commit, the PR title, and the trace folder all reference the same Trace ID.

Reviewer roles for each verification lens are templated and repeatable, available to any engineer across any engagement.

None of these in isolation is novel. The closed-loop combination is what makes the discipline survive engineer rotation, sprint pressure, and the engineers' own honest rationalizations.

What our clients observe

Engagement leads at our long-standing clients have remarked on a shift in the way our team works since we adopted this discipline. Comprehension memos arrive before the first PR. PRs come with audit trails their internal teams can follow without a handover meeting. Regressions, when they do surface, surface as known-and-anticipated rather than as mystery investigations. The pattern is most visible in engagements where the consequence of an unexplained change is highest, on systems carrying audit, compliance, or data-integrity requirements. See examples in our case studies.

How this adapts on engagement. The shape adjusts to the client's existing process. Some teams already have most of the infrastructure and we extend it. Some adopt it from scratch over a few weeks. The four phases and the trace thread are constant.

Engage

Ways to engage

A structured diagnosis with three depth formats, plus a forthcoming field guide for self-paced reading.

Option 01

Practice diagnosis

A scoped review of your team's AI-assisted delivery practice by our engineering team. Three formats, pick the depth that matches your urgency. Each one produces a written recommendation your team can act on independently, regardless of whether you engage further with us.

Async

Async assessment

Fill in a structured form covering your codebase, team size, sprint cadence, and the symptoms you are seeing. Our engineering team reviews and sends back a one-page diagnosis.

One-page diagnosis within a week. No call required.

90 minutes

Live working session

Live session with your engineering lead and one or two engineers. We walk through a recent feature end-to-end and identify where the cycle is breaking.

90 minutes live. Written recommendation within a week.

One week

Embedded diagnosis

Our team works alongside yours, reviews several active tickets across the four phases, and produces a written recommendation covering traceability, verification, and enforcement.

Five working days. Deep written recommendation.

Tell us which format suits your team in the form. We respond within one business day.

AnAr Offerings

Every AnAr engagement runs on this process

The four-phase cycle is not a standalone service. It is the delivery discipline built into every engagement model AnAr runs. Each service type puts different phases of the cycle under the most pressure.

Offering 01

AI-Driven Application Modernization

Modernization where shipping AI features is part of the goal. Heavy comprehension on the legacy system, AI integration scoped alongside the architectural work.

Phase emphasis: Understand Explore Offering 02

Agentic AI Development

Production agents fail in ways unit tests cannot reach. Five lenses run on every change before merge.

Phase emphasis: Verify Explore Offering 03

Product Engineering

Long-running builds across multiple engineers and releases. The trace thread holds the engagement together across rotation.

Phase emphasis: Traceability Explore Offering 04

Global Team Solutions

Embedded teams across time zones. The enforcement loop carries context that handover meetings cannot.

Phase emphasis: Enforcement loop Explore

Common questions

Frequently asked questions

Questions we get regularly from engineering leads and CTOs before a first conversation.

Does this process require switching AI coding tools?

No. The process is tool-agnostic. The four phases and the trace thread work regardless of which AI coding assistant the team uses. The disciplines are behavioral and structural, not product-specific.

We do not name or recommend a specific tool in this process. What matters is the habit of running comprehension before writing code, and verification across five lenses before merge.

Does this work on legacy or brownfield codebases?

It was designed for them. Most published AI-assisted development advice assumes a greenfield codebase. This process puts the Understand phase first specifically because brownfield systems carry decisions that are not documented and invariants that are not tested.

The Understand phase is heavier on older systems. That is by design, not a limitation. Codebase comprehension before any change is the discipline that prevents compounding the drift already present in a legacy system.

How long does it take a team to adopt this discipline?

The structural pieces, agent context files, PR templates, and the CI gate, can be set up in a single sprint. Most teams have the enforcement loop in place within two to three weeks.

The behavioral shift takes longer. Engineers internalizing the four phases as their default, rather than a checklist they comply with, typically takes four to eight weeks in our experience. Teams that start with the enforcement loop already in place get there faster, because the structure makes the discipline the path of least resistance.

What does a practice diagnosis actually deliver?

Each format produces a written recommendation. The async assessment produces a one-page document covering where the team is across each of the four phases and the two or three highest-priority changes we would make. The 90-minute working session walks through a recent feature end-to-end and produces a more detailed written recommendation. The one-week embedded diagnosis produces a comprehensive recommendation covering traceability, verification, and the enforcement loop.

All three are written for an engineering team to act on independently. Whether or not the team engages further with AnAr, the recommendation is actionable.

Can we adopt just one part of the process?

Yes. Some engagements start with verification, adding the five-lens review to an existing team's PR process. Others start with traceability. The full cycle works as a coherent whole, but each discipline also stands on its own.

In practice, teams that start with one piece tend to pull in the others once they see the value. The enforcement loop in particular tends to surface gaps in adjacent phases quickly.

How does this fit alongside an existing sprint process?

The four phases map onto most existing sprint workflows without adding ceremonies. Understand and Plan happen before the first commit. Implement is the existing development work. Verify happens before the PR is submitted for review.

The enforcement loop, the PR template, the CI gate, and the trace folder, wraps your existing workflow rather than replacing it. The artefacts it produces become the audit trail your team already needs for client-facing work.

What does working with an AnAr team look like in practice?

An AnAr delivery team integrates with your existing process. We embed rather than parachute in. The trace discipline, agent context files, and reviewer templates we build are designed to stay in your repository after the engagement ends, owned by your team.

The goal is that the discipline becomes yours, not a dependency on us. The enforcement loop is explicit about this: it lives in the repository, not in the people.

This page is the most common starting point for those conversations. Browse our case studies for examples of the engineering work where this process operates.