AACPv1.4
[ The Lab · AACP v1.4 ]

The Lab

Working multi-agent workflows validated across four LLM models. Real data. Measured results. Open source.

5
workflow types validated
76
AACP packets per model run
$0.0190
full department run, gpt-4.1-mini
$0.00
audit agent cost — deterministic

What the lab does

The lab runs five real business workflows using AACP v1.4 packets as the coordination layer between specialist agents. All data is read from CSV files. All output is written to a formatted Excel workbook. All agent communication is via AACP packets — none of the coordination is natural language.

Workflows: Payroll · Month-End Close · Sales Qualification · CS Resolution · JML Onboarding

Validated comparison — Q1 FY2026 close

All four models ran all five workflows. All 76 AACP packets validated per model. Cheapest model first.

ModelCostTokens inLatencyPass
gpt-4.1-mini$0.019040,585160s✓ 5/5
gpt-4.1$0.098441,005124s✓ 5/5
claude-sonnet-4-5$0.223753,127423s✓ 5/5
gpt-4o$0.240840,459101s✓ 5/5

Q1 FY2026 close. 5 workflows. 76 hops per model. All packets validated against AACP v1.4 schema. June 2026.

Which model for which workflow

Based on measured output quality, not just cost.

WorkflowBest modelReason
JML Onboardinggpt-4.1-miniDeterministic tasks. Any model works. Lowest cost.
Sales Qualificationgpt-4.1Best scoring judgment on borderline leads.
CS Resolutiongpt-4.1Best goodwill decisions by LTV. Conservative where appropriate.
Month-End Closegpt-4.1Caught 3 material variances. gpt-4o caught only 1.
Payrollgpt-4.1Reliable anomaly detection. 5/5 hops every run.
gpt-4.1 is the best all-round model for structured enterprise agent workflows. Full 5-workflow department day: $0.0984.

Workflows

HR + FINhops: 5

Payroll

data:
8 employees, 4 cost centres
agents:
HR-Agent, Finance-Agent, Audit-Agent
basis:
HMRC PAYE, UK payroll practice
Key finding

Engineering CC-10 at 90%+ utilisation flagged on all models.

FINhops: 6

Month-End Close

data:
GL trial balance, bank data
agents:
Finance-Agent, Audit-Agent
basis:
NetSuite Autonomous Close 2026
Key finding

gpt-4.1 and Claude caught 3 material variances — gpt-4o caught only 1.

SALEShops: 5 per lead (25 total for 5 leads)

Sales Qualification

data:
5 leads with BANT scoring data
agents:
Sales-Agent, Audit-Agent
basis:
Salesforce Agentforce 2026
Key finding

CoreTech (CEO, £120k budget) scored 97.5 on all models. Delta (not engaged) correctly rejected on all models.

CShops: 5 per ticket (25 total for 5 tickets)

CS Resolution

data:
5 tickets with LTV and sentiment data
agents:
CS-Agent, Audit-Agent
basis:
Zendesk Resolution Platform 2026
Key finding

gpt-4.1 made the most commercially sensible goodwill decisions — generous with high LTV, conservative with low LTV.

HR + IThops: 6 per hire (18 total for 3 hires)

JML Onboarding

data:
3 new hires with roles and system requirements
agents:
HR-Agent, IT-Agent, Audit-Agent
basis:
ConductorOne, Lumos, CloudEagle 2025-2026
Key finding

All 3 hires provisioned on all 4 models. Perfect pass rate. gpt-4.1-mini sufficient.

Honest framing

These results measure agent task execution not protocol overhead. Every coordination message was an AACP v1.4 packet. Token counts include the full agent conversation including system prompts and data payloads — not just coordination tokens.

The lab is open source. Results are reproducible. Comparison JSON and full workflow output available on GitHub.

What the lab produces

Each run generates a formatted Excel workbook with six sheets:

SheetContents
SummaryAll workflows, costs, success rate, model
PayrollEmployee pay breakdown, anomalies, CC status
Sales PipelineLead scores, BANT breakdown, routing decisions
CS ResolutionTicket outcomes, goodwill offers, amounts
JML OnboardingNew hire provisioning status, systems granted
Month-End CloseReconciliation results, material variances

The workbook is colour-coded: red for breaches, amber for warnings, green for pass. Produced by agent coordination. No manual formatting required.

Run it yourself

Requires: pip install anthropic openai openpyxl

Estimated cost per full run: $0.02 — $0.25 depending on model.