AACPv1.4

The Case · v1.4

The Case for Coordination Efficiency

Why token consumption is now a cost, sustainability, and governance issue — and what protocol-level design can do about it.

17%
Data centre electricity growth, 2025
50%
AI-focused data centre growth, 2025
30×
More tokens per task, multi-agent vs single call
~25%
of enterprise AI teams misestimate costs by 50%+

Sources: IEA, April 2026 · iternal.ai, March 2026 · CIO.com survey, 2025.

Tokens are the root. Everything else follows.

A token is roughly four characters of text — the unit of work for every LLM API call. Tokens represent real compute. Each one requires matrix multiplications across billions of model parameters. At scale, token count is directly proportional to electricity consumed.

The efficiency gains in AI hardware are real but they are being consumed by growth, not banked as savings. The IEA is clear: more people are using AI, and energy-intensive uses such as AI agents are on the rise.

"Data centre electricity consumption is set to roughly double from 485 TWh in 2025 to 950 TWh in 2030 — around 3% of global electricity demand."

— IEA, Key Questions on Energy and AI, April 2026

950 TWh is approximately Japan's total annual electricity consumption.

Complexity multiplies tokens exponentially

Standard queries are cheap. Agentic workflows are not. Multi-agent systems multiply this with every hop, every tool call, every coordination message.

Agent complexityTokens per taskvs single call
Simple (1–2 tool calls)5,000 – 15,0002–3×
Moderate (3–5 tool calls)15,000 – 50,0005–10×
Complex multi-step50,000 – 200,00010–30×
Multi-agent orchestration200,000 – 1,000,000+20–50×
Agentic coding workflows1M – 3.5M per task100–500×

Source: iternal.ai LLM Token Usage Projection Guide, March 2026. Token usage also exhibits large variance across runs — some runs use up to 10× more tokens than others for identical tasks.

Cheaper tokens does not mean cheaper AI

"Chief Product Officers should not confuse the deflation of commodity tokens with the democratisation of frontier reasoning."

— Gartner Senior Director Analyst, May 2026

Token prices are falling roughly 80% year-over-year. Enterprise AI bills are rising.

Gartner found that cheaper tokens will not translate to cheaper enterprise AI because agentic models require far more tokens per task than standard models, and increased consumption outpaces falling unit costs.

Nearly a quarter of enterprise AI teams underestimate costs by 50% or more. These overruns rarely originate from model costs alone — they emerge from operational overhead that becomes visible only after systems move into production.

Not all tokens are equal

Task tokens

The work an agent performs — reading documents, generating reports, analysing data. These carry direct business value. Reducing them means reducing the quality of the work.

Coordination tokens

The instructions agents send each other — routing, status updates, action requests, handoffs. These carry no direct business value. They exist purely to move work between agents. In virtually every current implementation, they are written in verbose natural language by default.

In a four-hop payroll workflow, coordination tokens are approximately 20% of total input tokens. In IT provisioning workflows, they exceed 40%. As agent counts grow, coordination overhead compounds with every hop.

Background: how this protocol started as a token-saving experiment.

MIT Lincoln Laboratory: processing one million tokens emits carbon equivalent to driving a petrol vehicle five to twenty miles. (MIT Sloan, February 2026)

The coordination content layer is unsolved

MCP standardised how agents access external tools. A2A standardised how agents route tasks between each other. Neither addresses what goes inside the messages.

The coordination content layer — the semantic compression of agent-to-agent instructions — is the next unsolved layer in multi-agent infrastructure. It is where the verbosity lives, where the waste accumulates, and where a protocol-level solution has the broadest leverage.

English
"Please retrieve the employee salary
records for the period ending 31 August
2024. I need all active employees, their
departments, cost centres, base salary,
any changes made this month, and pension
contribution rates. Return as JSON array."
AACP v1.4
FETCH|HR|return:HR-Agent|p:1|aacp:1.4
|res:emp_salary|period:2024-08
|filter:status=active|fmt:json

Measured from live API — not estimated

HopEnglishAACPClaudeGPT-4o
fetch employees5652-7.1%-12.7%
fetch budgets5747-17.5%-16.0%
merge & calculate6543-33.8%-31.6%
generate report6243-30.6%-33.3%
TOTAL240185-22.9%-23.7%

Honest framing

AACP reduces coordination tokens (the instructions agents send each other) by ~23% on both Claude and GPT-4o. It does not reduce task tokens (the actual model work, reading documents, generating reports, analysing data). Total workflow cost depends on your coordination-to-task ratio. Coordination-heavy workflows (IT provisioning, structured data pipelines) benefit most. Calculator figures below are projections; the table above shows measured live-API runs.

What engineering teams should do today

01 · Audit

Audit your coordination layer

Measure coordination vs task tokens in your multi-agent workflows. If you do not know this number, you cannot manage it. Most teams discover coordination overhead is higher than expected.

02 · Design

Design for efficiency from the start

Agent cost optimisation is a first-class architectural concern in 2026, similar to how cloud cost optimisation became essential in the microservices era. Retrofitting costs significantly more than building it in.

03 · Report

Include tokens in sustainability reporting

Token efficiency belongs alongside compute location and renewable energy sourcing in AI sustainability reporting. Architecture decisions made today contribute to the IEA's 2030 projections.

AACP is an open draft. Collaboration welcome.

AACP v1.4 is a measured, open specification for coordination layer standardisation in multi-agent LLM systems. IETF Internet-Draft on file. Working Python SDK. Measured benchmarks on four models.