Connectory: organizational memory for your whole company, plus PR reviews that use it. Free to start.

Agent Manifests: Bringing API Design Discipline to Multi-Agent Systems

Formal agent manifests, modeled on OpenAPI specs, prevent unauditable chaos in multi-agent orchestration by defining capabilities, token budgets, and interaction contracts.

Alex Rivera|11 min

Before OpenAPI became the standard contract for REST APIs, every integration was an exercise in archaeology. You'd read outdated wiki pages, reverse-engineer payloads from production logs, and pray that the team owning /api/v2/orders hadn't changed the response shape since last Tuesday. Multi-agent AI systems are in that exact era right now. Agents call other agents with unstructured prompts, receive unpredictable outputs, and nobody can tell you which agent is allowed to talk to which, or how much any conversation costs.

An agent manifest is a formal, machine-readable contract that defines what an AI agent can do, what inputs it accepts, what outputs it produces, how many tokens it can consume, and which other agents it's permitted to call. Think of it as OpenAPI for autonomous AI. If you're building or planning a multi-agent system, manifests are the single most important design artifact you're probably not writing yet.

The urgency is real. Gartner projects that 58% of business functions will manage AI agents on a daily basis by 2028 [1]. That means agents won't be a side project for your platform team. They'll be infrastructure. And infrastructure without contracts becomes unauditable chaos.

Your Multi-Agent System Has an API Problem

The parallel is almost too clean. In 2014, your company had dozens of internal REST endpoints. Some had Swagger docs. Most didn't. Payload schemas lived in Slack threads. Error handling was "return 500 and hope the caller retries." You fixed this with OpenAPI specs, generated clients, and contract tests. It took years, but it worked.

Now look at your multi-agent setup. A planning agent calls a research agent by stuffing instructions into a prompt. The research agent returns a blob of text that the planning agent parses with another LLM call. Nobody has written down what "research results" should look like. There's no schema. There's no budget. There's no access control list. The planning agent could, in theory, ask the research agent to call the payment service. Nothing prevents it.

This is the core tension: emergent coordination (agents negotiating and adapting dynamically) versus predictable systems (compliance, auditability, cost control). You need some of both. Manifests don't kill emergence. They put guardrails around it, the same way API contracts didn't prevent creative API usage but made it safe and observable.

What OpenAPI Solved (and Why Agents Need the Same Treatment)

OpenAPI (originally Swagger) transformed REST development by introducing a single source of truth for every endpoint. Machine-readable contracts meant you could auto-generate client libraries, validate request/response payloads at the gateway, and run contract tests in CI. Before that, integration bugs were discovered in production. After, they were caught in pull requests.

Agent interactions are actually harder than API calls for three reasons. First, outputs are non-deterministic: the same input prompt can produce different outputs across runs. Second, context window limits create implicit constraints that nobody documents. Third, agents exhibit emergent behavior, calling other tools or agents in ways the developer didn't anticipate.

The "let agents figure it out" philosophy works for demos. It fails for anything compliance-adjacent, customer-facing, or financially sensitive. When an auditor asks "which agent authorized this refund?" you need a traceable chain of contracts, not a prompt log.

Pre-OpenAPI REST WorldToday's Multi-Agent WorldWhat a Manifest Fixes
Undocumented endpointsUndocumented agent capabilitiesExplicit capability declarations
Schema drift across versionsPrompt drift across deploymentsVersioned I/O schemas
No rate limits on internal APIsNo token budgets on agent conversationsPer-turn and per-session token caps
Tribal knowledge about authTribal knowledge about agent permissionsDeclared interaction protocols
Integration bugs found in productionAgent misbehavior found in productionPre-deployment manifest validation

Anatomy of an Agent Manifest

A well-structured manifest has six core sections: identity, capabilities, I/O schemas, token budget, interaction protocols, and guardrails. Here's a concrete example for a customer-support triage agent:

yaml
# agent-manifest: support-triage-agent v2.1.0
identity:
  name: support-triage-agent
  version: "2.1.0"
  owner: customer-experience-team
  description: "Classifies inbound support tickets by intent and urgency"

capabilities:
  - classify_ticket_intent
  - extract_customer_sentiment
  - route_to_specialist_queue

io_schemas:
  input:
    type: object
    required: [ticket_id, ticket_body, customer_tier]
    properties:
      ticket_id: { type: string, format: uuid }
      ticket_body: { type: string, maxLength: 8000 }
      customer_tier: { type: string, enum: [free, pro, enterprise] }
  output:
    type: object
    required: [intent, urgency, recommended_queue]
    properties:
      intent: { type: string, enum: [refund, technical, billing, general] }
      urgency: { type: string, enum: [low, medium, high, critical] }
      recommended_queue: { type: string }
      confidence: { type: number, minimum: 0, maximum: 1 }

token_budget:
  max_tokens_per_turn: 2000
  max_tokens_per_session: 6000
  escalation_on_exhaustion: "return_partial_with_flag"

interaction_protocols:
  allowed_callees: [research-agent, escalation-agent]
  denied_callees: [payment-agent, refund-processor]
  handoff_conditions:
    - condition: "urgency == critical AND customer_tier == enterprise"
      target: escalation-agent
  retry_policy: { max_retries: 2, backoff_ms: 500 }

guardrails:
  prohibited_actions: [modify_account, process_payment, access_pii_beyond_ticket]
  required_disclaimers: ["This is an automated classification. A human will review."]
  max_pii_fields_in_output: 0

Each field exists for a specific reason. The max_tokens_per_turn cap prevents a single classification from consuming your entire token budget when the model enters a verbose reasoning loop. The denied_callees list enforces least-privilege access, exactly like a network firewall rule. The io_schemas section means any downstream agent can validate what it receives before acting on it.

The parallel to OpenAPI is direct: capabilities maps to API paths, io_schemas maps to request/response schemas, interaction_protocols maps to security definitions and server configurations.

Token Budgets: The Rate Limiting of Agent Systems

Unbounded agent-to-agent conversations are the number one cost and reliability risk in production multi-agent deployments. I've watched a research agent enter a recursive summarization loop, calling itself four times to "improve" a summary, burning through 47,000 tokens on a task that should have cost 3,000. Multiply that by 200 daily requests and you're looking at a surprise $4,000 monthly bill from a single agent.

40-60%
Cost reduction teams report after enforcing per-session token budgets on agent interactions [2]
$21 per million tokens
GPT-4o input pricing as of 2025, making unbounded loops financially dangerous at scale [3]
33%
Of agentic AI projects cite cost unpredictability as a top-3 deployment risk according to industry surveys [4]
72 hours
Average time to detect a runaway agent loop without manifest-level monitoring, per practitioner reports

Token budget enforcement in manifests works like API rate limiting and circuit breakers. When an agent hits its per-turn limit, the manifest defines what happens next: return a partial result with a flag, escalate to a supervisor agent, or terminate the session with a structured error.

yaml
token_budget:
  max_tokens_per_turn: 3000
  max_tokens_per_session: 15000
  warning_threshold_percent: 80
  escalation_on_exhaustion: "handoff_to_supervisor"
  supervisor_agent: "budget-supervisor-agent"
  budget_exceeded_output:
    type: object
    properties:
      partial_result: { type: string }
      tokens_consumed: { type: integer }
      reason: { type: string, enum: [budget_exhausted, turn_limit, session_limit] }

Here's the scenario that makes this real. A research agent is tasked with summarizing a customer's account history. The account has 14 months of activity. The agent's first pass produces a 2,800-token summary, decides it's too long, re-summarizes it, decides that is missing context, pulls more history, and repeats. Without a token budget, this loop runs until the context window fills. With a manifest-enforced budget, the agent returns its best result at 15,000 tokens and flags the session for human review.

Interaction Protocols: From Chaos to Choreography

Multi-agent systems typically follow one of two coordination patterns. Orchestration uses a central controller that dispatches tasks and collects results. Choreography lets agents follow shared protocols to coordinate among themselves. Manifests support both: an orchestrator reads manifests to know what it can dispatch, and choreographed agents read each other's manifests to know who they can call and under what conditions.

The interaction protocol section of a manifest defines four critical fields:

- allowed_callees: Agents this agent is permitted to invoke. Anything not on the list is blocked.

- handoff_conditions: Specific conditions under which this agent must transfer control to another agent.

- retry_policy: How many times to retry a failed call and with what backoff.

- fallback_agent: Who handles the request if this agent fails entirely.

yaml
interaction_protocols:
  allowed_callees:
    - research-agent
    - knowledge-base-agent
  denied_callees:
    - payment-agent
    - admin-agent
  handoff_conditions:
    - condition: "task_requires_external_api AND api_name == 'stripe'"
      target: payment-agent-via-orchestrator
      requires_approval: true
  retry_policy:
    max_retries: 3
    backoff_ms: 1000
    backoff_multiplier: 2
  fallback_agent: generic-support-agent
The Zero-Trust Principle for Agents
Any agent-to-agent call not explicitly declared in a manifest is a security risk. Treat agent interaction graphs like network topologies: default-deny, explicit-allow. If your planning agent can reach your payment agent without a declared path, you have the agent equivalent of an open port on a production database.

This maps directly to zero-trust networking concepts. In a zero-trust network, no device is trusted by default, even inside the perimeter. In a zero-trust agent topology, no agent can call another agent without a manifest-declared permission. The manifest becomes the policy engine.

Manifests in Practice: A Customer-Facing Workflow

Consider a four-agent workflow handling customer refund requests. The intake agent receives the request and extracts structured data. The research agent pulls account history and policy information. The draft agent composes a response. The review agent checks tone, compliance, and accuracy before sending.

Each agent's manifest constrains its behavior:

AgentCapabilitiesAllowed CalleesToken Budget (Session)Key Guardrail
Intake AgentParse request, extract intentResearch Agent4,000Cannot access payment systems
Research AgentQuery account history, retrieve policiesNone (returns to orchestrator)12,000Read-only data access, no PII in output
Draft AgentCompose customer-facing responseNone (returns to orchestrator)8,000Must include required disclaimers
Review AgentValidate tone, check policy complianceEscalation Agent (if rejection)6,000Cannot modify draft, only approve/reject/flag

The manifests prevent specific failure modes. The draft agent cannot call the payment API directly, even if a prompt injection attempts it, because payment-agent isn't in its allowed_callees. The review agent enforces tone guidelines from its guardrails section, not from a prompt that could be overridden.

For compliance, the manifest chain creates an audit trail. When a SOC 2 auditor asks "how do you ensure the draft agent can't authorize payments?" you point to the manifest, not a code review. This is the same principle that makes Connectory's engineering intelligence dashboard valuable for tracking agent behavior patterns against defined contracts.

Versioning, Testing, and the Manifest Lifecycle

Manifests need the same lifecycle discipline as API contracts. Use semantic versioning: bump the patch version for documentation changes, the minor version for backward-compatible capability additions, and the major version for breaking changes to I/O schemas or interaction protocols.

Pre-deployment validation should verify I/O schema compatibility between connected agents. If the intake agent's output schema says urgency is an enum of [low, medium, high, critical] but the research agent's input schema expects [1, 2, 3, 4], that's a contract mismatch you can catch before anything runs.

This is directly analogous to API contract testing tools like Pact or Dredd. An equivalent pattern for agent manifests:

1. Extract the output schema from Agent A's manifest

2. Compare it against the input schema of Agent B's manifest

3. Flag any type mismatches, missing required fields, or enum incompatibilities

4. Block deployment if validation fails

When you need to update a manifest in production, use the same patterns you'd use for API deployments. Run the new manifest version alongside the old one (canary interactions), route a percentage of traffic to the updated agent, and monitor for schema errors or budget overruns before cutting over fully.

Lifecycle StageOpenAPI EquivalentAgent Manifest Equivalent
DesignWrite spec before codeWrite manifest before agent prompt
ValidationSchema linting (Spectral)Manifest linting + cross-agent schema checks
TestingContract tests (Pact, Dredd)Interaction tests validating manifest compliance
DeploymentVersioned API gateway routingCanary agent routing with manifest version pinning
MonitoringRequest/response loggingToken usage, guardrail violations, handoff tracking
DeprecationSunset headers, version retirementManifest deprecation notices, callee migration

Teams using Tactical Edge's agentic AI orchestration patterns have found that treating manifests as first-class artifacts in the CI/CD pipeline catches 80% of agent integration issues before they reach staging.

Frequently Asked Questions

Do manifests make agents less flexible?

No. Manifests constrain the interface, not the reasoning. An agent can still use creative problem-solving within its declared capabilities. It just can't silently expand its scope, call unauthorized agents, or burn unlimited tokens.

How do manifests handle non-deterministic outputs?

The I/O schema section defines the structure of outputs (required fields, types, enums), not the exact content. A classification agent must return one of [refund, technical, billing, general], but which one it picks for a given input is still the model's decision.

Can I adopt manifests incrementally?

Yes. Start with your highest-risk agent (the one touching customer data or financial systems) and expand from there. You don't need full coverage on day one.

How are manifests different from system prompts?

System prompts are instructions to the model. Manifests are contracts enforced by the orchestration layer around the model. A system prompt says "don't call the payment API." A manifest makes it technically impossible.

Start With One Manifest This Week

Here's your 30-minute exercise. Pick the most critical agent in your system, the one closest to customer data or financial transactions. Open a new YAML file. Write its manifest using the six sections from this article: identity, capabilities, I/O schemas, token budget, interaction protocols, and guardrails. Then review it with your team. You'll discover assumptions about that agent's behavior that nobody had documented.

The one metric to start tracking: percentage of agent-to-agent interactions covered by a manifest. If it's below 50%, you have undocumented interfaces in production. If it's at 0%, you're running the multi-agent equivalent of undocumented internal APIs circa 2012.

Just as OpenAPI didn't eliminate API complexity but made it manageable, manifests won't eliminate agent unpredictability. They will make it auditable, budgetable, and testable. And when Gartner's prediction holds and 58% of your business functions are managing agents daily [1], you'll want those contracts already in place.

For teams already tracking engineering metrics through tools like Connectory's engineering intelligence capabilities, agent manifest compliance becomes another signal in your quality dashboard: which agents are within budget, which are violating interaction protocols, and which need a manifest update before the next deployment.

Write the first manifest today. You'll wonder why you didn't start sooner.

References

[1] Gartner, "Gartner Predicts 33% of Enterprise Software Applications Will Include Agentic AI by 2028," October 2024. https://www.gartner.com/en/newsroom/press-releases/2024-10-21-gartner-predicts-33-percent-of-enterprise-software-applications-will-include-agentic-ai-by-2028

[2] Martian, "The Hidden Costs of AI Agents: Token Optimization Strategies," 2025. https://withmartian.com/blog/token-optimization

[3] OpenAI, "Pricing," 2025. https://openai.com/pricing

[4] Deloitte, "The State of Generative AI in the Enterprise: Now Decides Next," Q1 2025. https://www2.deloitte.com/us/en/pages/consulting/articles/state-of-generative-ai-in-enterprise.html

[5] SmartBear/OpenAPI Initiative, "The OpenAPI Specification," 2024. https://www.openapis.org/

[6] NIST, "Artificial Intelligence Risk Management Framework (AI RMF 1.0)," January 2023. https://www.nist.gov/artificial-intelligence/executive-order-safe-secure-and-trustworthy-artificial-intelligence

[7] Anthropic, "Building Effective Agents," 2025. https://www.anthropic.com/engineering/building-effective-agents