Agent Manifests: Bringing API Design Discipline to Multi-Agent Systems
Formal agent manifests, modeled on OpenAPI specs, prevent unauditable chaos in multi-agent orchestration by defining capabilities, token budgets, and interaction contracts.
Before OpenAPI became the standard contract for REST APIs, every integration was an exercise in archaeology. You'd read outdated wiki pages, reverse-engineer payloads from production logs, and pray that the team owning /api/v2/orders hadn't changed the response shape since last Tuesday. Multi-agent AI systems are in that exact era right now. Agents call other agents with unstructured prompts, receive unpredictable outputs, and nobody can tell you which agent is allowed to talk to which, or how much any conversation costs.
An agent manifest is a formal, machine-readable contract that defines what an AI agent can do, what inputs it accepts, what outputs it produces, how many tokens it can consume, and which other agents it's permitted to call. Think of it as OpenAPI for autonomous AI. If you're building or planning a multi-agent system, manifests are the single most important design artifact you're probably not writing yet.
The urgency is real. Gartner projects that 58% of business functions will manage AI agents on a daily basis by 2028 [1]. That means agents won't be a side project for your platform team. They'll be infrastructure. And infrastructure without contracts becomes unauditable chaos.
Your Multi-Agent System Has an API Problem
The parallel is almost too clean. In 2014, your company had dozens of internal REST endpoints. Some had Swagger docs. Most didn't. Payload schemas lived in Slack threads. Error handling was "return 500 and hope the caller retries." You fixed this with OpenAPI specs, generated clients, and contract tests. It took years, but it worked.
Now look at your multi-agent setup. A planning agent calls a research agent by stuffing instructions into a prompt. The research agent returns a blob of text that the planning agent parses with another LLM call. Nobody has written down what "research results" should look like. There's no schema. There's no budget. There's no access control list. The planning agent could, in theory, ask the research agent to call the payment service. Nothing prevents it.
This is the core tension: emergent coordination (agents negotiating and adapting dynamically) versus predictable systems (compliance, auditability, cost control). You need some of both. Manifests don't kill emergence. They put guardrails around it, the same way API contracts didn't prevent creative API usage but made it safe and observable.
What OpenAPI Solved (and Why Agents Need the Same Treatment)
OpenAPI (originally Swagger) transformed REST development by introducing a single source of truth for every endpoint. Machine-readable contracts meant you could auto-generate client libraries, validate request/response payloads at the gateway, and run contract tests in CI. Before that, integration bugs were discovered in production. After, they were caught in pull requests.
Agent interactions are actually harder than API calls for three reasons. First, outputs are non-deterministic: the same input prompt can produce different outputs across runs. Second, context window limits create implicit constraints that nobody documents. Third, agents exhibit emergent behavior, calling other tools or agents in ways the developer didn't anticipate.
The "let agents figure it out" philosophy works for demos. It fails for anything compliance-adjacent, customer-facing, or financially sensitive. When an auditor asks "which agent authorized this refund?" you need a traceable chain of contracts, not a prompt log.
| Pre-OpenAPI REST World | Today's Multi-Agent World | What a Manifest Fixes |
|---|---|---|
| Undocumented endpoints | Undocumented agent capabilities | Explicit capability declarations |
| Schema drift across versions | Prompt drift across deployments | Versioned I/O schemas |
| No rate limits on internal APIs | No token budgets on agent conversations | Per-turn and per-session token caps |
| Tribal knowledge about auth | Tribal knowledge about agent permissions | Declared interaction protocols |
| Integration bugs found in production | Agent misbehavior found in production | Pre-deployment manifest validation |
Anatomy of an Agent Manifest
A well-structured manifest has six core sections: identity, capabilities, I/O schemas, token budget, interaction protocols, and guardrails. Here's a concrete example for a customer-support triage agent:
# agent-manifest: support-triage-agent v2.1.0
identity:
name: support-triage-agent
version: "2.1.0"
owner: customer-experience-team
description: "Classifies inbound support tickets by intent and urgency"
capabilities:
- classify_ticket_intent
- extract_customer_sentiment
- route_to_specialist_queue
io_schemas:
input:
type: object
required: [ticket_id, ticket_body, customer_tier]
properties:
ticket_id: { type: string, format: uuid }
ticket_body: { type: string, maxLength: 8000 }
customer_tier: { type: string, enum: [free, pro, enterprise] }
output:
type: object
required: [intent, urgency, recommended_queue]
properties:
intent: { type: string, enum: [refund, technical, billing, general] }
urgency: { type: string, enum: [low, medium, high, critical] }
recommended_queue: { type: string }
confidence: { type: number, minimum: 0, maximum: 1 }
token_budget:
max_tokens_per_turn: 2000
max_tokens_per_session: 6000
escalation_on_exhaustion: "return_partial_with_flag"
interaction_protocols:
allowed_callees: [research-agent, escalation-agent]
denied_callees: [payment-agent, refund-processor]
handoff_conditions:
- condition: "urgency == critical AND customer_tier == enterprise"
target: escalation-agent
retry_policy: { max_retries: 2, backoff_ms: 500 }
guardrails:
prohibited_actions: [modify_account, process_payment, access_pii_beyond_ticket]
required_disclaimers: ["This is an automated classification. A human will review."]
max_pii_fields_in_output: 0Each field exists for a specific reason. The max_tokens_per_turn cap prevents a single classification from consuming your entire token budget when the model enters a verbose reasoning loop. The denied_callees list enforces least-privilege access, exactly like a network firewall rule. The io_schemas section means any downstream agent can validate what it receives before acting on it.
The parallel to OpenAPI is direct: capabilities maps to API paths, io_schemas maps to request/response schemas, interaction_protocols maps to security definitions and server configurations.
Token Budgets: The Rate Limiting of Agent Systems
Unbounded agent-to-agent conversations are the number one cost and reliability risk in production multi-agent deployments. I've watched a research agent enter a recursive summarization loop, calling itself four times to "improve" a summary, burning through 47,000 tokens on a task that should have cost 3,000. Multiply that by 200 daily requests and you're looking at a surprise $4,000 monthly bill from a single agent.
Token budget enforcement in manifests works like API rate limiting and circuit breakers. When an agent hits its per-turn limit, the manifest defines what happens next: return a partial result with a flag, escalate to a supervisor agent, or terminate the session with a structured error.
token_budget:
max_tokens_per_turn: 3000
max_tokens_per_session: 15000
warning_threshold_percent: 80
escalation_on_exhaustion: "handoff_to_supervisor"
supervisor_agent: "budget-supervisor-agent"
budget_exceeded_output:
type: object
properties:
partial_result: { type: string }
tokens_consumed: { type: integer }
reason: { type: string, enum: [budget_exhausted, turn_limit, session_limit] }Here's the scenario that makes this real. A research agent is tasked with summarizing a customer's account history. The account has 14 months of activity. The agent's first pass produces a 2,800-token summary, decides it's too long, re-summarizes it, decides that is missing context, pulls more history, and repeats. Without a token budget, this loop runs until the context window fills. With a manifest-enforced budget, the agent returns its best result at 15,000 tokens and flags the session for human review.
Interaction Protocols: From Chaos to Choreography
Multi-agent systems typically follow one of two coordination patterns. Orchestration uses a central controller that dispatches tasks and collects results. Choreography lets agents follow shared protocols to coordinate among themselves. Manifests support both: an orchestrator reads manifests to know what it can dispatch, and choreographed agents read each other's manifests to know who they can call and under what conditions.
The interaction protocol section of a manifest defines four critical fields:
- allowed_callees: Agents this agent is permitted to invoke. Anything not on the list is blocked.
- handoff_conditions: Specific conditions under which this agent must transfer control to another agent.
- retry_policy: How many times to retry a failed call and with what backoff.
- fallback_agent: Who handles the request if this agent fails entirely.
interaction_protocols:
allowed_callees:
- research-agent
- knowledge-base-agent
denied_callees:
- payment-agent
- admin-agent
handoff_conditions:
- condition: "task_requires_external_api AND api_name == 'stripe'"
target: payment-agent-via-orchestrator
requires_approval: true
retry_policy:
max_retries: 3
backoff_ms: 1000
backoff_multiplier: 2
fallback_agent: generic-support-agentThis maps directly to zero-trust networking concepts. In a zero-trust network, no device is trusted by default, even inside the perimeter. In a zero-trust agent topology, no agent can call another agent without a manifest-declared permission. The manifest becomes the policy engine.
Manifests in Practice: A Customer-Facing Workflow
Consider a four-agent workflow handling customer refund requests. The intake agent receives the request and extracts structured data. The research agent pulls account history and policy information. The draft agent composes a response. The review agent checks tone, compliance, and accuracy before sending.
Each agent's manifest constrains its behavior:
| Agent | Capabilities | Allowed Callees | Token Budget (Session) | Key Guardrail |
|---|---|---|---|---|
| Intake Agent | Parse request, extract intent | Research Agent | 4,000 | Cannot access payment systems |
| Research Agent | Query account history, retrieve policies | None (returns to orchestrator) | 12,000 | Read-only data access, no PII in output |
| Draft Agent | Compose customer-facing response | None (returns to orchestrator) | 8,000 | Must include required disclaimers |
| Review Agent | Validate tone, check policy compliance | Escalation Agent (if rejection) | 6,000 | Cannot modify draft, only approve/reject/flag |
The manifests prevent specific failure modes. The draft agent cannot call the payment API directly, even if a prompt injection attempts it, because payment-agent isn't in its allowed_callees. The review agent enforces tone guidelines from its guardrails section, not from a prompt that could be overridden.
For compliance, the manifest chain creates an audit trail. When a SOC 2 auditor asks "how do you ensure the draft agent can't authorize payments?" you point to the manifest, not a code review. This is the same principle that makes Connectory's engineering intelligence dashboard valuable for tracking agent behavior patterns against defined contracts.
Versioning, Testing, and the Manifest Lifecycle
Manifests need the same lifecycle discipline as API contracts. Use semantic versioning: bump the patch version for documentation changes, the minor version for backward-compatible capability additions, and the major version for breaking changes to I/O schemas or interaction protocols.
Pre-deployment validation should verify I/O schema compatibility between connected agents. If the intake agent's output schema says urgency is an enum of [low, medium, high, critical] but the research agent's input schema expects [1, 2, 3, 4], that's a contract mismatch you can catch before anything runs.
This is directly analogous to API contract testing tools like Pact or Dredd. An equivalent pattern for agent manifests:
1. Extract the output schema from Agent A's manifest
2. Compare it against the input schema of Agent B's manifest
3. Flag any type mismatches, missing required fields, or enum incompatibilities
4. Block deployment if validation fails
When you need to update a manifest in production, use the same patterns you'd use for API deployments. Run the new manifest version alongside the old one (canary interactions), route a percentage of traffic to the updated agent, and monitor for schema errors or budget overruns before cutting over fully.
| Lifecycle Stage | OpenAPI Equivalent | Agent Manifest Equivalent |
|---|---|---|
| Design | Write spec before code | Write manifest before agent prompt |
| Validation | Schema linting (Spectral) | Manifest linting + cross-agent schema checks |
| Testing | Contract tests (Pact, Dredd) | Interaction tests validating manifest compliance |
| Deployment | Versioned API gateway routing | Canary agent routing with manifest version pinning |
| Monitoring | Request/response logging | Token usage, guardrail violations, handoff tracking |
| Deprecation | Sunset headers, version retirement | Manifest deprecation notices, callee migration |
Teams using Tactical Edge's agentic AI orchestration patterns have found that treating manifests as first-class artifacts in the CI/CD pipeline catches 80% of agent integration issues before they reach staging.
Frequently Asked Questions
Do manifests make agents less flexible?
No. Manifests constrain the interface, not the reasoning. An agent can still use creative problem-solving within its declared capabilities. It just can't silently expand its scope, call unauthorized agents, or burn unlimited tokens.
How do manifests handle non-deterministic outputs?
The I/O schema section defines the structure of outputs (required fields, types, enums), not the exact content. A classification agent must return one of [refund, technical, billing, general], but which one it picks for a given input is still the model's decision.
Can I adopt manifests incrementally?
Yes. Start with your highest-risk agent (the one touching customer data or financial systems) and expand from there. You don't need full coverage on day one.
How are manifests different from system prompts?
System prompts are instructions to the model. Manifests are contracts enforced by the orchestration layer around the model. A system prompt says "don't call the payment API." A manifest makes it technically impossible.
Start With One Manifest This Week
Here's your 30-minute exercise. Pick the most critical agent in your system, the one closest to customer data or financial transactions. Open a new YAML file. Write its manifest using the six sections from this article: identity, capabilities, I/O schemas, token budget, interaction protocols, and guardrails. Then review it with your team. You'll discover assumptions about that agent's behavior that nobody had documented.
The one metric to start tracking: percentage of agent-to-agent interactions covered by a manifest. If it's below 50%, you have undocumented interfaces in production. If it's at 0%, you're running the multi-agent equivalent of undocumented internal APIs circa 2012.
Just as OpenAPI didn't eliminate API complexity but made it manageable, manifests won't eliminate agent unpredictability. They will make it auditable, budgetable, and testable. And when Gartner's prediction holds and 58% of your business functions are managing agents daily [1], you'll want those contracts already in place.
For teams already tracking engineering metrics through tools like Connectory's engineering intelligence capabilities, agent manifest compliance becomes another signal in your quality dashboard: which agents are within budget, which are violating interaction protocols, and which need a manifest update before the next deployment.
Write the first manifest today. You'll wonder why you didn't start sooner.
References
[1] Gartner, "Gartner Predicts 33% of Enterprise Software Applications Will Include Agentic AI by 2028," October 2024. https://www.gartner.com/en/newsroom/press-releases/2024-10-21-gartner-predicts-33-percent-of-enterprise-software-applications-will-include-agentic-ai-by-2028
[2] Martian, "The Hidden Costs of AI Agents: Token Optimization Strategies," 2025. https://withmartian.com/blog/token-optimization
[3] OpenAI, "Pricing," 2025. https://openai.com/pricing
[4] Deloitte, "The State of Generative AI in the Enterprise: Now Decides Next," Q1 2025. https://www2.deloitte.com/us/en/pages/consulting/articles/state-of-generative-ai-in-enterprise.html
[5] SmartBear/OpenAPI Initiative, "The OpenAPI Specification," 2024. https://www.openapis.org/
[6] NIST, "Artificial Intelligence Risk Management Framework (AI RMF 1.0)," January 2023. https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence
[7] Anthropic, "Building Effective Agents," 2025. https://www.anthropic.com/engineering/building-effective-agents