Every AI coding assistant available today — Copilot, Cursor, Windsurf, and the rest — routes your prompt to a single model and returns the first answer it produces. This is fast, cheap, and works well for the majority of autocomplete tasks. But it has a structural flaw that becomes visible as soon as you ask something non-trivial.

Single-model tools inherit all the blind spots of their underlying model. Different frontier models miss different edge cases — one may struggle with concurrent code, another over-sanitises outputs in security-sensitive contexts, a third has its own systematic tendencies. No model is wrong in the same way, and that difference matters.

The Case for Disagreement

In software engineering, code review exists precisely because the author of a function is the worst person to find its bugs. They already know what they intended — their brain autocorrects mistakes before they notice them. A second pair of eyes, especially one that does not share the author's mental model, catches what the author missed.

Multi-agent consensus applies this principle to AI: we run the same task through multiple models and force them to critique and verify each other's output before we synthesise a final answer. The value is not blind agreement, but structured disagreement under a shared checkpoint protocol.

This is not an original idea in academia. Ensemble methods in machine learning have existed for decades, and multi-agent debate has been studied as a technique to improve reasoning (see Du et al., "Improving Factuality and Reasoning in Language Models through Multiagent Debate", 2023). What we built is a practical implementation of that idea on top of real production LLM APIs.

How the Orchestrator Works

The S.A.M.I. orchestrator runs server-side, parallelising work across all council members. When a task arrives it goes through seven stages:

01 Clarify. An interactive clarification loop — if the task is ambiguous, the council asks targeted questions before proceeding. All context is locked before later stages begin.
02 Plan. All council members pressure-test the plan and contracts and vote on weak spots; the coordinating member synthesises the path forward.
03 Architect. System design and interface contracts are fixed. All members vote; the coordinating member records the binding architecture.
04 Draft. Only one council member writes the binding code. Every other member critiques direction, sketches alternatives in prose, and files change requests — they do not write code at this stage.
05 Refine. All council members challenge the draft, propose variants, and measure tradeoffs. The coordinating member chooses the revision path.
06 Safety Review. A full red-team and security audit. All models check the output against sources, repo context, and prior stage artifacts.
07 Debugger. Final validation before the result is delivered. A blocking vote pauses delivery and sends the result back to the council until the objection is resolved into a genuine yes.

What We Got Wrong First

Our initial design ran all agents sequentially. Task one goes to Model A, the result goes to Model B for review, then to Model C for security, and so on. This was conceptually clean but painfully slow — a moderately complex feature could take four to five minutes before the user saw any output. Users noticed and complained.

We moved to full parallelism in the next iteration and immediately hit a different problem: models producing structurally incompatible outputs. If two models solve the same problem with different class hierarchies, merging their outputs is not trivial. We introduced a schema contract at the decomposition stage — each subtask now includes a structured output specification that all agents must conform to. Divergences from the schema are flagged before the review stage, not after.

We also discovered that some models perform badly when told explicitly that other models will review their work — they become overly verbose and add defensive disclaimers rather than writing tighter code. We removed any mention of the multi-agent context from individual agent system prompts. Each agent believes it is the only one working on the task. The review happens after the fact, invisibly.

Consistent Review, Not Selective Shortcuts

A common design temptation is to skip the council on "small" tasks to save latency and cost. S.A.M.I. deliberately does not: every task runs through the full council. Blind spots do not announce themselves in advance — a one-line change can still touch a security boundary or an API contract — so the review that catches them has to be consistent, not conditional.

Cost stays sane not by shortening the council but through the economics of the pipeline: the high-value work is the plan, and a cheaper model writes from that agreed plan. What you control is the autonomy level and your plan's token budget — not whether the work gets reviewed.

Where We Are Going

The current orchestrator is a server-side, highly concurrent runtime. It works well and is the production path. A lower-level systems rewrite for the performance-critical hot path is a future workstream — the current runtime is not going anywhere soon.

If you want to try the consensus system, download the S.A.M.I. desktop app and create an account. The orchestrator is the engine behind every agent run — from the Free mini-council up.

The Case for Disagreement

How the Orchestrator Works

The S.A.M.I. orchestrator runs server-side, parallelising work across all council members. When a task arrives it goes through seven stages:

01 Clarify. An interactive clarification loop — if the task is ambiguous, the council asks targeted questions before proceeding. All context is locked before later stages begin.

02 Plan. All council members pressure-test the plan and contracts and vote on weak spots; the coordinating member synthesises the path forward.

03 Architect. System design and interface contracts are fixed. All members vote; the coordinating member records the binding architecture.

04 Draft. Only one council member writes the binding code. Every other member critiques direction, sketches alternatives in prose, and files change requests — they do not write code at this stage.

05 Refine. All council members challenge the draft, propose variants, and measure tradeoffs. The coordinating member chooses the revision path.

06 Safety Review. A full red-team and security audit. All models check the output against sources, repo context, and prior stage artifacts.

07 Debugger. Final validation before the result is delivered. A blocking vote pauses delivery and sends the result back to the council until the objection is resolved into a genuine yes.

What We Got Wrong First

Consistent Review, Not Selective Shortcuts

Where We Are Going

If you want to try the consensus system, download the S.A.M.I. desktop app and create an account. The orchestrator is the engine behind every agent run — from the Free mini-council up.

Why We Built S.A.M.I. Around Multi-Agent Consensus

The Case for Disagreement

How the Orchestrator Works

What We Got Wrong First

Consistent Review, Not Selective Shortcuts

Where We Are Going

Why We Built S.A.M.I. Around Multi-Agent Consensus

The Case for Disagreement

How the Orchestrator Works

What We Got Wrong First

Consistent Review, Not Selective Shortcuts

Where We Are Going