AI agents that can write code are easy to find. AI agents that write code I’d actually ship are rare. Nwave is one attempt at the second problem, and understanding its shape helps me think more clearly about every other AI coding tool I use.

What is Nwave?

Nwave is an agentic AI software delivery methodology that runs inside Claude Code. It slices the work of shipping a feature into six ordered waves, assigns a specialized agent to each wave, and stops for a human review between waves.

The tagline on the Nwave site is “Human-Centric Agentic AI Software Factory.” That reads like marketing, but the “human-centric” part is load-bearing. The whole architecture exists to keep a person in the loop, not as a polite gesture but as an enforced gate.

By the end of this article, you’ll have a mental model for how Nwave differs from typing “build me a login page” into a general-purpose agent, and why the difference matters if you care about shipping software that still works a year from now.

Why Nwave exists

Most coding agents default to a single generalist persona that tries to do everything: understand the requirement, guess the architecture, write the code, and maybe write a test at the end. When the result is wrong, it’s wrong in a way that’s hard to unwind. The test fits the code instead of the behavior. The architecture is whatever fell out of the first prompt. No artifact between “idea” and “merged PR” passed under a human’s eyes.

Nwave’s answer borrows the practices that already work for human teams: acceptance-test-driven development, Outside-In Test-Driven Development (TDD), peer review, and architecture decision records. It enforces them through the agent runtime itself. The methodology is the guardrail. The agents are the workers.

How Nwave works

The six waves

Nwave splits feature delivery into six waves. Each wave has a specialized agent, a command, and an artifact that lands on disk for a human to read before the next wave starts.

graph TB A[DISCOVER
product-discoverer
market validation] --> B[DISCUSS
product-owner
requirements] B --> C[DESIGN
solution-architect
architecture and ADRs] C --> D[DEVOPS
platform-architect
infrastructure readiness] D --> E[DISTILL
acceptance-designer
Given-When-Then tests] E --> F[DELIVER
software-crafter
working implementation] style A fill:#e1f5fe style B fill:#f3e5f5 style C fill:#e8f5e8 style D fill:#fff3e0 style E fill:#fce4ec style F fill:#f1f8e9

DISCOVER validates there’s a real problem worth solving. DISCUSS turns that into acceptance criteria a product owner would sign. DESIGN produces an architecture with decision records you can argue about. DEVOPS sketches the infrastructure and deployment path. DISTILL writes Given-When-Then acceptance tests before any production code exists. DELIVER implements the feature through Outside-In TDD, turning each acceptance test green one layer at a time.

For greenfield work, I run all six. For a brownfield change where the problem and architecture are already known, I skip straight to /nw:deliver.

The machine-human rhythm

The shape Nwave enforces is this cycle:

graph TB A[Agent generates artifact] --> B[Human reviews] B --> C[Human decides] C --> D[Next agent runs] D --> A style A fill:#e1f5fe style B fill:#fff3e0 style C fill:#fff3e0 style D fill:#e1f5fe

The machine never runs the full pipeline unsupervised. If I skip a review, I’m skipping it deliberately, not because the tool let the agent barrel through.

The enforcement layer

Nwave includes DES (Delivery Enforcement System), a guardrail that checks what the agent is doing during DELIVER. DES prevents the agent from skipping the active TDD loop, and it does not let the agent claim the work is finished until RED, GREEN, and REFACTOR all happen. DES also scans prompts sent to Claude Code for “jump ahead” command patterns, for example “run step-3” or “do deliver-04 now.” The tricky part is that normal writing can contain similar numbers. A date like “2026-04-20,” a version like “v1.2.3,” or a status code like “404” includes digits and separators, but those are not workflow commands, so DES ignores them instead of treating them as roadmap-step requests.

Here is the difference in one line: prompt frameworks describe a process, Nwave enforces it at runtime. DES blocks skipped TDD phases and rejects unsupported “done” claims. If you have used Spec Kit, OpenSpec, or BMAD Method, think of Nwave as adding runtime gates, not just workflow text.

Rigor profiles

Not every task deserves exhaustive mutation testing. Nwave exposes profiles that scale quality depth to stakes:

  • lean: Haiku-class agent, no reviewer, RED-then-GREEN only. For spikes and config changes.
  • standard: Sonnet agent, Haiku reviewer, full five-phase TDD. For most features.
  • thorough: Opus agent, Sonnet reviewer, full TDD. For critical features.
  • exhaustive: Opus agent and reviewer, full TDD, mutation testing with a kill rate at or above 80 percent. For production-core changes.
  • custom: Pick each setting yourself.

The profile persists across sessions, so I set it once per task type instead of arguing with the agent every run.

Key relationships

Nwave stands on three ideas that predate it.

Acceptance Test-Driven Development. The DISTILL wave exists because ATDD has been telling us for twenty years that tests written after the code are mostly coverage theater. Nwave forces the tests to exist first, in a form a product owner can read.

Outside-In TDD. The DELIVER wave works from the outside in. The acceptance test drives the first failing unit test at the boundary. That unit test drives the next. The implementation grows inward toward the domain core. Outside-In TDD predates Nwave by decades. The novelty is that an agent executes it without shortcuts.

Claude Code as the substrate. Nwave runs as a plugin (and a CLI installer) inside Claude Code, not as a standalone product. If Claude Code is new to you, read What Is Claude Code? first. The specialized agents, the skills, and the hooks all ride Claude Code’s primitives.

Trade-offs and limitations

Nwave costs tokens. The methodology rests on running multiple specialized agents with peer reviewers in between. Exhaustive mode with mutation testing can cost an order of magnitude more than a single generalist pass. The rigor profiles exist precisely because the default is expensive.

Nwave is slower by the clock. A single prompt to a generalist agent lands a PR in minutes. A greenfield Nwave run with all six waves and real human reviews takes a day for a small feature, a week for a medium one. That trade is usually worth it, but it is a trade.

Nwave depends on you reading the artifacts. If I rubber-stamp the requirements and the architecture without engaging, the enforcement layer can’t save me. The human checkpoints are load-bearing only when the human bears load.

Nwave runs best in Claude Code, and it also supports OpenCode with extra environment setup. If your team standardizes on another runtime, you can still study the methodology even when you cannot carry over the full enforcement layer as-is.

Common misconceptions

“Nwave is just a prompt pack.” Nwave ships specialized agent definitions, reviewer agents, skills, hooks, and the DES enforcement layer. The hooks and DES make the methodology binding instead of optional. Prompts alone fail open when an agent skips RED. Hooks fail closed.

“Agents run end-to-end and I come back to a finished PR.” Each wave pauses for a human. An agent that pipelines all six waves without review has abandoned the methodology.

“You need to run all six waves every time.” For brownfield work where the problem and architecture already exist, skipping to /nw:deliver is the intended path. The six-wave stack is for greenfield features.

“More rigor is always better.” Exhaustive mode on a CSS tweak burns tokens and time for no signal. The rigor profiles exist because the right depth depends on the stakes. Picking lean for low-risk work is part of using Nwave correctly, not a concession.

Conclusion

Nwave assumes AI coding is most reliable when the workflow is predefined and enforced. Instead of letting the agent decide how to work, Nwave runs a six-wave process with specialist agents, human checkpoints, and DES guardrails that enforce TDD discipline.

The shape is worth understanding even if you never install the plugin. Any AI coding tool you use sits somewhere on the spectrum between “one generalist, no gates” and “specialists plus enforced checkpoints.” Nwave sits at the far end of that spectrum. Knowing where the end is makes it easier to judge where your current tool lives.

Next steps

References