There is a crisis hiding inside the multi-agent AI boom. You just cannot see it on SWE-bench.
Every major benchmark measures what a single agent can do in isolation. SWE-bench tests one agent resolving GitHub issues. GAIA tests one agent completing tasks. AgentBench evaluates one LLM across environments. You can compare any model on any of these, and the numbers look great.
But put two agents in a room and ask them to hand off work to each other, and the numbers collapse. Put ten agents in there with real dependencies, and the collapse becomes a fire.
The industry response has been predictable: build more infrastructure. LangGraph state machines. CrewAI role systems. AutoGen conversational protocols. xAI compiled Grok 4.20’s multi-agent debate directly into model weights, at an estimated 1.5 to 2.5 times the inference cost of a single pass.
And then there is Hermes Agent’s Kanban Swarm v1, which shipped two days ago. It adds 279 lines of Python to the existing codebase. It does not introduce a second scheduler. Its “blackboard protocol” is a JSON.stringify call appended to a comment column in a SQLite table.
It might be the most honest approach to agent coordination on the market.
The Problem Everyone Is Ignoring#
The multi-agent AI market is projected to hit $52 billion by 2030. Seventy-two percent of enterprise AI projects now use multi-agent architectures, up from 23 percent in 2024. Those numbers come from Zylos Research, which found that organizations using multi-agent systems report 45 percent faster resolution and 60 percent more accurate outcomes.
Those are the headline numbers. The ones nobody talks about are the failure rates.
Research published this year on swarm-based dialogue systems found that OpenAI’s Swarm framework, which uses stateless routing, experiences progressive accuracy collapse from 84 percent down to 0 percent as task complexity increases. The AgentMarketCap analysis covering this research identifies the root cause: without global state tracking, agents just acknowledge subtasks and terminate threads rather than maintaining coherent multi-step chains.
This is not an edge case. This is what happens when you optimize for speed without benchmarking for what you actually need: reliable task completion under realistic conditions.
The same analysis found that naive routing hits about 65 percent accuracy with a switching instability rate of 0.44, while confidence-gated routing reaches 77 percent with an instability rate of 0.11. That 12-point gap might not sound dramatic. But in a ten-agent pipeline running thousands of tasks per day, it means failed workflows, runaway retries, and incidents that nobody can diagnose because there is no audit trail to follow.
Coordination failures account for 37 percent of multi-agent system breakdowns, according to a broader survey of 17 multi-agent topologies. Verification gaps add another 21 percent. These are not bugs in the models. They are architectural failures in the orchestration layer.
The Reflex to Build Upward#
Every framework that has emerged to address this problem shares the same instinct: add a new layer.
LangGraph gives you a state machine for agent workflows, with nodes and edges you define explicitly. CrewAI introduces role-based teams with a manager agent that delegates to specialized workers. AutoGen builds conversational protocols for agents to negotiate task completion. Each of these is valuable for specific use cases, but each also introduces a new runtime, a new abstraction, and a new surface area for things to go wrong.
xAI took a different approach entirely with Grok 4.20, which ships four specialized agents that debate each other’s reasoning before responding. The coordination logic is compiled into the model weights and inference graph, eliminating round-trip latency between agents. The results are impressive (a 78 percent non-hallucination rate on Artificial Analysis Omniscience tests), but the architecture is not something you can replicate. If you want to route work between multiple agents, you do not have the option of compiling it into your model weights.
The common thread across every approach is the assumption that agent coordination requires a new infrastructure layer. Something that sits above the agents and manages them. Something that does not exist in the system already.
The Kanban Swarm#
Enter the Hermes Kanban Swarm, which takes the opposite bet.
Hermes Agent v0.15.0 shipped on May 28, 2026, with a feature called Kanban Swarm v1. The entire module is 279 lines of Python, and its opening docstring reads:
“This module intentionally does not introduce a second scheduler. It writes a small task graph into the existing Kanban kernel.”
That sentence is doing a lot of work. “Intentionally” means they considered the alternative and rejected it. “Existing Kanban kernel” means they looked at what was already running in production and asked whether it could absorb the new capability without a new service.
The topology it produces looks like this:
Each worker runs as a full Hermes agent process with its own identity, its own tools, and its own profile. The verifier reviews every worker’s output and holds the gate: it must complete with metadata {"gate": "pass"} before the synthesizer can proceed. If the evidence is insufficient, the verifier blocks with an exact description of what is missing.
The shared blackboard where workers communicate is deliberately low-tech. Structured JSON comments on the root task, prefixed with a marker string. post_blackboard_update writes a JSON blob into a comment. latest_blackboard reads all comments back and merges them by key. Later values replace earlier ones, with an _authors map tracking who wrote what. All of that state lives in rows the existing dashboard, notifier, CLI, and dispatcher already know how to read.
What It Looks Like for the Person Running It#
If you use Hermes Agent, you already have at least one Kanban board (I wrote a complete guide to how the system works here). It comes with every install. If you have multiple Hermes profiles (a researcher, a coder, a reviewer), you already have the workers. The swarm just wires them together.
The command to create one looks like this:
hermes kanban swarm "Audit our API surface for security regressions" \
--worker researcher:"Scan endpoints and dependencies":web \
--worker coder:"Check auth middleware implementation" \
--worker coder:"Review rate limiting and input validation" \
--verifier reviewer \
--synthesizer writer
You get back four task IDs: the root, the workers, the verifier, the synthesizer. Then you walk away.
This is still an explicit CLI action. There is no /swarm slash command inside a Hermes session, and the dispatcher won’t spontaneously create a swarm topology on your behalf. Auto-decomposition (where triage breaks a single task into sub-tasks) is a separate mechanism and produces a flat dependency tree, not a gated pipeline with a verifier and synthesizer. Right now, invoking a swarm is a deliberate choice you make at the terminal.
Behind the scenes, the existing Kanban dispatcher that’s already running in your gateway picks up the worker tasks on its next tick, sixty seconds later at most. For each one, it spawns a full Hermes process under the assigned profile. The researcher profile gets its web research tools. The coder profile gets its terminal and file access. Each worker starts in a fresh scratch workspace that is destroyed when the task completes: no cross-contamination between workers, no cleanup left for you.
To check progress, you run:
hermes kanban list --mine
Task │ Status │ Assignee │ Title
───────┼─────────┼───────────┼──────────────────────────────
abc12 │ running │ coder │ Check auth middleware
def34 │ running │ coder │ Review rate limiting
ghi56 │ running │ researcher│ Scan endpoints
jkl78 │ todo │ reviewer │ Verify swarm outputs
mno90 │ todo │ writer │ Synthesize swarm outputs
The verifier and synthesizer are still todo. They stay todo until every worker reaches done. When the last worker completes, the dispatcher promotes the verifier to ready on its next tick. The verifier profile spawns, reads every worker’s output from the blackboard, and either passes the gate (metadata {"gate": "pass"}) or blocks with a list of what needs fixing.
If it blocks, you see it in the dashboard: a blocked task with a comment describing exactly what is missing. You can unblock it by fixing the gap manually or reassigning the work. If it passes, the synthesizer profile spawns and writes the final deliverable.
No polling. No separate dashboard service. No new infrastructure to learn. The same hermes kanban list command you already use for your task board shows you the swarm. The same dispatcher that moves your personal tasks between columns promotes the gates. The same dashboard that shows your backlog shows the swarm’s progress.
That is the whole point: running a multi-agent swarm should not feel different from managing any other kanban workflow. You create tasks. You wait. You review. You ship. The fact that some of the workers are AI agents and some might be human is invisible at the board level.
Why This Matters#
The Kanban Swarm avoids three failure modes that plague every other approach.
First, it eliminates the progressive accuracy collapse. OpenAI’s Swarm loses accuracy because it is stateless. The Kanban Swarm is stateful by design. Every handoff is a SQLite row. Every status transition is a row update. The verifier gating pattern ensures nothing proceeds until the gatekeeper explicitly approves. There is no path for agents to silently acknowledge subtasks and wander off, because the state machine enforces that completion only happens through the gate.
Second, it survives restarts. If a LangGraph state machine crashes mid-workflow, it loses its in-memory state. A CrewAI delegation that times out leaves the parent agent waiting. The Kanban Swarm writes everything to durable SQLite. If the dispatcher restarts, if the machine reboots, if a worker’s process is killed and respawned, the state is still there. The task is still on the board. The worker picks up where it left off.
Third, it includes human review as a first-class pattern. The verifier gate is not just for automated agents. A human can unblock a blocked verifier task, add a comment to the blackboard, or promote a task from todo to ready. The board is the same board whether the reader is a person or a model. The critical insight: Kanban was designed for human task management decades before AI agents existed. The abstraction was already human-compatible.
The Deeper Architectural Lesson#
The broader lesson here is about how to think about infrastructure.
A kanban board is a state machine where rows are work items, columns are status transitions, and dependencies are edges between rows. That is all. The column structure enforces progress, the visibility ensures accountability, and the durability guarantees survivability. Those properties were not designed for AI agents. They were designed for manufacturing supply chains in the 1970s. They generalize because coordination problems generalize.
The multi-agent industry is building up: adding layers, adding runtimes, adding abstractions. Every new orchestration framework ships with a new scheduler, a new state representation, a new way to express workflows. Hermes’s Kanban Swarm went sideways. It looked at what was already running in production and asked whether the existing abstraction could absorb the new requirement. The answer was yes, because a task board is already a coordination substrate. It just needed the right topology written into it.
When you need agents to cooperate, you do not need a new coordination protocol. You need a task management system with gating, persistence, and human visibility. Those systems already exist. They have existed for decades. The problem is not that we lack the infrastructure for agent coordination. The problem is that we keep building new infrastructure instead of recognizing the infrastructure we already have.
The Kanban Swarm is a proof of that principle: 279 lines of Python, no second scheduler, and a JSON.stringify blackboard. It works because the hard problem was already solved. We just forgot to look.
