The Case · v1.4
The Case for Coordination Efficiency
Why token consumption is now a cost, sustainability, and governance issue — and what protocol-level design can do about it.
Sources: IEA, April 2026 · iternal.ai, March 2026 · CIO.com survey, 2025.
Tokens are the root. Everything else follows.
A token is roughly four characters of text — the unit of work for every LLM API call. Tokens represent real compute. Each one requires matrix multiplications across billions of model parameters. At scale, token count is directly proportional to electricity consumed.
The efficiency gains in AI hardware are real but they are being consumed by growth, not banked as savings. The IEA is clear: more people are using AI, and energy-intensive uses such as AI agents are on the rise.
"Data centre electricity consumption is set to roughly double from 485 TWh in 2025 to 950 TWh in 2030 — around 3% of global electricity demand."
— IEA, Key Questions on Energy and AI, April 2026
950 TWh is approximately Japan's total annual electricity consumption.
Complexity multiplies tokens exponentially
Standard queries are cheap. Agentic workflows are not. Multi-agent systems multiply this with every hop, every tool call, every coordination message.
| Agent complexity | Tokens per task | vs single call |
|---|---|---|
| Simple (1–2 tool calls) | 5,000 – 15,000 | 2–3× |
| Moderate (3–5 tool calls) | 15,000 – 50,000 | 5–10× |
| Complex multi-step | 50,000 – 200,000 | 10–30× |
| Multi-agent orchestration | 200,000 – 1,000,000+ | 20–50× |
| Agentic coding workflows | 1M – 3.5M per task | 100–500× |
Source: iternal.ai LLM Token Usage Projection Guide, March 2026. Token usage also exhibits large variance across runs — some runs use up to 10× more tokens than others for identical tasks.
Cheaper tokens does not mean cheaper AI
"Chief Product Officers should not confuse the deflation of commodity tokens with the democratisation of frontier reasoning."
— Gartner Senior Director Analyst, May 2026
Token prices are falling roughly 80% year-over-year. Enterprise AI bills are rising.
Gartner found that cheaper tokens will not translate to cheaper enterprise AI because agentic models require far more tokens per task than standard models, and increased consumption outpaces falling unit costs.
Nearly a quarter of enterprise AI teams underestimate costs by 50% or more. These overruns rarely originate from model costs alone — they emerge from operational overhead that becomes visible only after systems move into production.
Not all tokens are equal
Task tokens
The work an agent performs — reading documents, generating reports, analysing data. These carry direct business value. Reducing them means reducing the quality of the work.
Coordination tokens
The instructions agents send each other — routing, status updates, action requests, handoffs. These carry no direct business value. They exist purely to move work between agents. In virtually every current implementation, they are written in verbose natural language by default.
In a four-hop payroll workflow, coordination tokens are approximately 20% of total input tokens. In IT provisioning workflows, they exceed 40%. As agent counts grow, coordination overhead compounds with every hop.
Background: how this protocol started as a token-saving experiment.
MIT Lincoln Laboratory: processing one million tokens emits carbon equivalent to driving a petrol vehicle five to twenty miles. (MIT Sloan, February 2026)
The coordination content layer is unsolved
MCP standardised how agents access external tools. A2A standardised how agents route tasks between each other. Neither addresses what goes inside the messages.
The coordination content layer — the semantic compression of agent-to-agent instructions — is the next unsolved layer in multi-agent infrastructure. It is where the verbosity lives, where the waste accumulates, and where a protocol-level solution has the broadest leverage.
"Please retrieve the employee salary records for the period ending 31 August 2024. I need all active employees, their departments, cost centres, base salary, any changes made this month, and pension contribution rates. Return as JSON array."
FETCH|HR|return:HR-Agent|p:1|aacp:1.4 |res:emp_salary|period:2024-08 |filter:status=active|fmt:json
Measured from live API — not estimated
| Hop | English | AACP | Claude | GPT-4o |
|---|---|---|---|---|
| fetch employees | 56 | 52 | -7.1% | -12.7% |
| fetch budgets | 57 | 47 | -17.5% | -16.0% |
| merge & calculate | 65 | 43 | -33.8% | -31.6% |
| generate report | 62 | 43 | -30.6% | -33.3% |
| TOTAL | 240 | 185 | -22.9% | -23.7% |
Honest framing
AACP reduces coordination tokens (the instructions agents send each other) by ~23% on both Claude and GPT-4o. It does not reduce task tokens (the actual model work, reading documents, generating reports, analysing data). Total workflow cost depends on your coordination-to-task ratio. Coordination-heavy workflows (IT provisioning, structured data pipelines) benefit most. Calculator figures below are projections; the table above shows measured live-API runs.
What engineering teams should do today
01 · Audit
Audit your coordination layer
Measure coordination vs task tokens in your multi-agent workflows. If you do not know this number, you cannot manage it. Most teams discover coordination overhead is higher than expected.
02 · Design
Design for efficiency from the start
Agent cost optimisation is a first-class architectural concern in 2026, similar to how cloud cost optimisation became essential in the microservices era. Retrofitting costs significantly more than building it in.
03 · Report
Include tokens in sustainability reporting
Token efficiency belongs alongside compute location and renewable energy sourcing in AI sustainability reporting. Architecture decisions made today contribute to the IEA's 2030 projections.
Sources and further reading
- IEA — Key Questions on Energy and AI (April 2026)
- IEA — Energy and AI (April 2025)
- Brookings Institution — Global Energy Demands Within the AI Regulatory Landscape (April 2026)
- MIT Sloan — AI Has High Data Centre Energy Costs (February 2026)
- Fortune / Gartner — Microsoft Reports Expose AI's Real Cost Problem (May 2026)
- iternal.ai — LLM Token Usage Projection Guide (March 2026)
- CIO.com — How to Get AI Agent Budgets Right in 2026
AACP is an open draft. Collaboration welcome.
AACP v1.4 is a measured, open specification for coordination layer standardisation in multi-agent LLM systems. IETF Internet-Draft on file. Working Python SDK. Measured benchmarks on four models.