AACP · Blog

I built a coordination protocol for multi-agent LLM systems

By Andrew Mackay·10 June 2026·7 min read

The honest origin

This started as a simple observation, not a grand plan.

I was building multi-agent workflows and noticed that every time one agent needed to instruct another, it wrote a natural language message. Something like:

"Please retrieve the employee salary records for the period ending March 2026. I need all active employees, their departments, cost centres, base salary, any changes made this month, and pension contribution rates. Return as JSON array."

That varies in wording every time the workflow runs. It contains pleasantries the receiving agent does not need. And when something goes wrong, there is no clean audit trail, just a log file full of slightly different English sentences.

I started wondering whether this was actually the right approach, or just the default one.

The first question

The obvious question was whether you could compress those instructions. Replace verbose English with something more structured. I ran some benchmarks using live API usage_metadata and found consistent results: across a four-hop payroll workflow, coordination token usage dropped by 22.9% on Claude Sonnet 4.5 and 23.7% on GPT-4o versus equivalent English instructions. The full benchmark table lives on the case page.

That felt like a result worth building on. So I defined a packet format:

FETCH|HR|return:HR-Agent|p:1|aacp:1.4|res:emp_salary|period:2026-03|filter:status=active|fmt:json

Typed. Pipe-delimited. Self-describing. Produced identically on every run by a rule-based encoder at zero LLM cost. I called it AACP (Agent Action Compression Protocol) and published a Python SDK with encoders for common business workflows: payroll, IT provisioning, invoice processing, contract review.

For known workflow types, the encoder produces the packet deterministically. For novel instructions, a four-tier fallback routes to LLM encoding once, logs the result, and serves it from cache on every subsequent call. An amortisation benchmark across 240 encoding operations showed a 91.6% cost saving versus per-call LLM encoding, with 6 LLM calls required across the full run.

I filed an IETF Internet-Draft, stood up a site at aacp.dev, and started getting feedback.

What I got wrong about the value

The token reduction is real and measurable. But when I started running the protocol through more complex workflows and getting feedback from engineers, it became clear that token reduction was not the most interesting thing about it.

The more interesting thing was determinism.

When an agent coordinates using AACP packets rather than natural language, the coordination layer becomes predictable. The same workflow produces the same packets on every run. You can validate a packet against the schema before sending it. You can log every coordination hop as a structured record without post-processing. You can replay a workflow and know the coordination instructions were identical to the original run.

Natural language coordination cannot offer any of those things reliably.

This matters more in some environments than others. For a simple two-agent prototype, the difference is minimal. But for workflows that run repeatedly at scale, or where auditability matters, or where multiple frameworks and models need to interoperate, the coordination content layer starts to look like a genuine gap in the current stack.

Where it actually fits

To be direct about the architecture: AACP operates between the routing layer and the task execution layer.

MCP (Anthropic) handles agent access to external tools. A2A (Google) handles agent discovery and task routing. Both are excellent at what they do. Neither specifies what agents say to each other inside coordination messages. That is the layer AACP addresses.

This means AACP is complementary to both, not competitive with either. A packet can travel inside an MCP payload or be routed by A2A. The protocol is transport-agnostic and model-agnostic by design. The why page has the four-quadrant architecture map.

Honest about where it helps most

AACP is most useful in specific conditions. It is worth being direct about this.

Repetitive structured workflows benefit most. Payroll runs monthly. IT provisioning follows the same steps for every new hire. Invoice processing applies the same logic every time. For these workflows, a rule-based encoder produces coordination packets at zero LLM cost on every run, indefinitely.

Compliance-sensitive pipelines have a different but equally concrete benefit. When you need to demonstrate what instruction was sent to which agent, when, and in what form, typed packets are a significantly cleaner audit record than parsed natural language. The audit agent in the lab runs without LLM calls. It logs structured records from the packets themselves at zero cost.

Multi-model and multi-framework systems face an interoperability challenge that AACP directly addresses. The same packet format validated identically across Claude Sonnet 4.5, GPT-4o, GPT-4.1, and GPT-4.1-mini in lab tests without model-specific prompt tuning. When agents run on different models or were built with different frameworks, a shared coordination vocabulary reduces translation overhead.

For enterprises specifically, the token reduction argument is secondary. What matters is auditability, determinism, and the ability to adopt a vendor-neutral coordination format that works regardless of which model or framework is running underneath. The coordination layer does not change when you swap from one LLM provider to another.

For simpler systems, the case is weaker. Two agents doing open-ended research have small coordination overhead relative to their task work. AACP adds the most value where workflows are repetitive, structured, and run at volume.

The framework integration results

After publishing the core SDK, I built integrations for LangChain and CrewAI to test the protocol in real framework contexts.

The methodology was straightforward: run the same five-workflow department day scenario (JML onboarding, payroll, sales qualification, customer service resolution, and month-end close) with and without AACP, using identical agents, identical data, and identical models. 59 coordination hops per run.

In a standard LangChain implementation, every agent-to-agent coordination hop involves an LLM generating the instruction. Across 59 hops that means 59 LLM calls before any task work begins. With AACP, that drops to zero for known workflows. Total workflow cost reduction: 18%.

CrewAI showed a 30% reduction across the same 59-hop scenario. The larger saving reflects that CrewAI's default natural language task descriptions are more verbose, so the per-hop saving is higher. In both frameworks, coordination messages became deterministic, schema-validated, and machine-readable audit records.

The QBR lab

The most recent addition is a Quarterly Business Review workflow that connects to real services. Five agents coordinate via AACP packets to fetch sprint metrics from Jira, extract decisions and action items from Notion, retrieve budget actuals from Google Sheets, consolidate everything into a single view, and draft an executive summary for human review.

The important design decision: the human review gate is explicit. The protocol coordinates the data gathering and consolidation. The recommendation in the output is clearly labelled as AI-assisted, not AI-decided. Deciding whether to continue, pause, or scale a project requires reading the room in ways the system cannot do. That boundary is intentional.

The same workflow runs at daily, weekly, monthly, and quarterly cadence by changing a single field in the coordination packet.

What I built

The full stack as of today (a fuller list is on the about page):

Python SDK (pip install aacp): 8 workflow encoders across 6 domains
TypeScript SDK (npm install aacp-ts)
LangChain integration (pip install aacp-langchain): 18% saving, 59 hops
CrewAI integration (pip install aacp-crewai): 30% saving, 59 hops
241 pre-validated community rules across 7 domains at registry.aacp.dev
VS Code extension (Dispatch) for packet building and validation
QBR lab with live Jira, Notion and Google Sheets connections
IETF Internet-Draft: draft-mackay-aacp-03

Everything is MIT licensed and on GitHub at github.com/MackayAndrew.

What I think this could become

The most interesting long-term possibility is a shared coordination vocabulary that agent frameworks can adopt independently of each other. A packet format that any agent understands, regardless of whether it is running on Claude, GPT, or an open model, and regardless of whether it was built with LangChain, CrewAI, AutoGen, or a custom framework.

That is a reasonable description of what AACP already is, for the domains and task types it covers today. The question is whether it scales to a broader set of workflow types, and whether the community of people building multi-agent systems finds it useful enough to adopt.

That is what I am trying to find out.

If you are building multi-agent systems and the determinism or auditability properties are relevant to your use case, the SDK is published and the spec is on this site. I would genuinely like to hear whether it is useful and where it falls short.

Read on

Read the spec View on GitHub IETF Draft-03 More posts →