[ The Lab · AACP v1.4 ]

The Lab

Working multi-agent workflows validated across four LLM models. Real data. Measured results. Open source.

workflow types validated

AACP packets per model run

$0.0190

full department run, gpt-4.1-mini

$0.00

audit agent cost (deterministic)

What the lab does

The lab runs five real business workflows using AACP v1.4 packets as the coordination layer between specialist agents. All data is read from CSV files. All output is written to a formatted Excel workbook. All agent communication is via AACP packets. None of the coordination is natural language.

Workflows: Payroll · Month-End Close · Sales Qualification · CS Resolution · JML Onboarding

Validated comparison: Q1 FY2026 close

All four models ran all five workflows. All 76 AACP packets validated per model. Cheapest model first.

Model	Cost	Tokens in	Latency	Pass
gpt-4.1-mini	$0.0190	40,585	160s	✓ 5/5
gpt-4.1	$0.0984	41,005	124s	✓ 5/5
claude-sonnet-4-5	$0.2237	53,127	423s	✓ 5/5
gpt-4o	$0.2408	40,459	101s	✓ 5/5

Q1 FY2026 close. 5 workflows. 76 hops per model. All packets validated against AACP v1.4 schema. June 2026.

Which model for which workflow

Based on measured output quality, not just cost.

Workflow	Best model	Reason
JML Onboarding	gpt-4.1-mini	Deterministic tasks. Any model works. Lowest cost.
Sales Qualification	gpt-4.1	Best scoring judgment on borderline leads.
CS Resolution	gpt-4.1	Best goodwill decisions by LTV. Conservative where appropriate.
Month-End Close	gpt-4.1	Caught 3 material variances. gpt-4o caught only 1.
Payroll	gpt-4.1	Reliable anomaly detection. 5/5 hops every run.

gpt-4.1 is the best all-round model for structured enterprise agent workflows. Full 5-workflow department day: $0.0984.

Workflows

HR + FINhops: 5

Payroll

data:: 8 employees, 4 cost centres
agents:: HR-Agent, Finance-Agent, Audit-Agent
basis:: HMRC PAYE, UK payroll practice

Key finding

Engineering CC-10 at 90%+ utilisation flagged on all models.

FINhops: 6

Month-End Close

data:: GL trial balance, bank data
agents:: Finance-Agent, Audit-Agent
basis:: NetSuite Autonomous Close 2026

Key finding

gpt-4.1 and Claude caught 3 material variances; gpt-4o caught only 1.

SALEShops: 5 per lead (25 total for 5 leads)

Sales Qualification

data:: 5 leads with BANT scoring data
agents:: Sales-Agent, Audit-Agent
basis:: Salesforce Agentforce 2026

Key finding

CoreTech (CEO, £120k budget) scored 97.5 on all models. Delta (not engaged) correctly rejected on all models.

CShops: 5 per ticket (25 total for 5 tickets)

CS Resolution

data:: 5 tickets with LTV and sentiment data
agents:: CS-Agent, Audit-Agent
basis:: Zendesk Resolution Platform 2026

Key finding

gpt-4.1 made the most commercially sensible goodwill decisions: generous with high LTV, conservative with low LTV.

HR + IThops: 6 per hire (18 total for 3 hires)

JML Onboarding

data:: 3 new hires with roles and system requirements
agents:: HR-Agent, IT-Agent, Audit-Agent
basis:: ConductorOne, Lumos, CloudEagle 2025-2026

Key finding

All 3 hires provisioned on all 4 models. Perfect pass rate. gpt-4.1-mini sufficient.

Honest framing

These results measure agent task execution not protocol overhead. Every coordination message was an AACP v1.4 packet. Token counts include the full agent conversation including system prompts and data payloads, not just coordination tokens.

The lab is open source. Results are reproducible. Comparison JSON and full workflow output available on GitHub.

What the lab produces

Each run generates a formatted Excel workbook with six sheets:

Sheet	Contents
Summary	All workflows, costs, success rate, model
Payroll	Employee pay breakdown, anomalies, CC status
Sales Pipeline	Lead scores, BANT breakdown, routing decisions
CS Resolution	Ticket outcomes, goodwill offers, amounts
JML Onboarding	New hire provisioning status, systems granted
Month-End Close	Reconciliation results, material variances

The workbook is colour-coded: red for breaches, amber for warnings, green for pass. Produced by agent coordination. No manual formatting required.

Run it yourself

GitHub: aacp-lab Python SDK IETF Draft

Requires: pip install anthropic openai openpyxl

Estimated cost per full run: $0.02 to $0.25 depending on model.