Fabric Setup¶
The multi-agent governance and configuration patterns referenced throughout Fabric › Setup draw on the active multi-agent configuration research lineage. This page catalogs the canonical archetypes, their measured outcomes, and the selection criteria the platform uses when configuring a fabric for a given workload.
Governance Principles¶
Governance of an agent fabric means making five concerns explicit at configuration time: identity (who is the agent, who delegated it), authorization (which tools / data, under whose policy), audit (per-step trace with replayable provenance), escalation (which actions need human approval), and budget (token, wall-clock, hop ceilings). AWS frames this as four pillars; Boundaries, Identity, Visibility, Evaluation; implemented via Bedrock AgentCore Identity (OAuth-scoped agent principals) and AgentCore Policy on Cedar. Google Vertex AI Agent Builder treats each agent as a first-class IAM principal with dedicated service accounts and routes all tool calls through an Agent Gateway policy enforcement point. OpenAI Agents SDK exposes input / output / tool guardrails with blocking vs parallel execution modes.
Configuration Archetypes And Measured Outcomes¶
- Supervisor / orchestrator-worker; best for research, planning, decomposable retrieval. The Anthropic multi-agent research system (Opus lead + Sonnet workers) outperformed single-agent Opus by 90.2% on internal research evaluations.
- Hierarchical team-of-teams; tree topology (Google ADK style); scales but adds translation overhead. LangChain reports supervisor routing accuracy degrades after 8-12 round trips.
- Peer handoff / swarm; LangGraph-Swarm reports roughly 40% lower end-to-end latency vs supervisor for conversational routing; weaker governance.
- Debate / consensus; useful on math, code-review, issue-resolution. M3MAD-Bench / Free-MAD (2025) report consensus pressure hurts accuracy and inflates token cost; SWE-Debate uses competitive (not conformity) debate.
- MetaGPT-style SOP; encodes assembly-line SOPs into prompt sequences; reports SOTA 85.9% / 87.7% Pass@1 on HumanEval / MBPP-class code-generation benchmarks (MetaGPT).
Role Design¶
A role spec needs five fields: objective, output format, tool set + sources, memory scope, and task boundaries. The Anthropic published subagent template names vague specs as the number-one cause of duplicated work. CrewAI encodes role / backstory / goal + Pydantic-typed outputs; LangGraph models roles as graph nodes with reducer-merged state; MetaGPT pins roles to SOP stages. Claude Code subagents are Markdown + YAML-frontmatter files in .claude/agents/ with scoped tool allowlists; granular tool access is the primary blast-radius control.
Model / Tier Mix¶
The Anthropic BrowseComp ablation reports token budget alone explains roughly 80% of performance variance; tool-call count and model choice account for the remaining roughly 15% (95% combined). A Sonnet 4 upgrade beat doubling the Sonnet 3.7 token budget. Heterogeneous mixing (strong orchestrator + cheaper workers) outperforms homogeneous high-capability fleets. A counterweight: Liu et al. (arXiv 2604.02460) find single-agent LLMs beat multi-agent systems on multi-hop reasoning at equal thinking-token budget; multi-agent wins only when problems are genuinely parallel.
Authorization And Policy Layers¶
Policy must live outside the agent. OPA / Rego at the tool-calling layer (the agent does not decide what is allowed; the engine does) is the established pattern; AWS AgentCore Policy uses Cedar with formal verification; Vertex's Agent Gateway is the equivalent central policy enforcement point. Scoped credential vaults (Anthropic workspace-scoped keys, short-lived narrowly-scoped credentials per Remote Control session) and per-step gates (OpenAI guardrails with run_in_parallel=False to prevent token / tool side-effects when a tripwire fires) round out the layering.
Identity And Isolation¶
Treat every agent as a non-human identity with cryptographic provenance; what code, model, and environment produced it (CSA "identity explosion" framing). The November 2025 MCP specification added tool-scoped authorization (SEP-835) and namespace isolation. Bind tokens to clients via DPoP (RFC 9449) to defeat replay; prefer short-lived tokens. Map controls to OWASP Agentic Top 10 / ASI (ASI03 Identity & Privilege Abuse, ASI04 Supply Chain, ASI07 Insecure Inter-Agent Comms, ASI10 Rogue Agents) and the proposed NIST AI RMF Agentic Profile.
Budget And Rate-Limit Configuration¶
Published Anthropic heuristics:
- Simple fact-finding; 1 agent, 3-10 tool calls.
- Direct comparison; 2-4 subagents, 10-15 calls each.
- Complex research; more than 10 subagents with non-overlapping responsibilities.
Agents use roughly 4× chat-baseline tokens; multi-agent systems roughly 15×. Set hop limits below the supervisor's 8-12-turn routing-accuracy cliff. AgentDropout (ACL 2025) and SupervisorAgent show runtime adaptive supervision cuts roughly 29.45% of tokens on GAIA with no success-rate loss.
Observability Hooks¶
Standardize on OpenTelemetry GenAI semantic conventions (GenAI SIG, experimental since April 2024). Three span operations: chat, invoke_agent, execute_tool, plus standard attributes for prompts / tokens / cost / tool I/O. Backends that consume the conventions: Arize Phoenix (native OpenInference instrumentors), LangSmith, Langfuse, Helicone, Traceloop. Standardizing on OTel avoids vendor lock-in.
Outcome-Optimal Configuration Research¶
- Anthropic engineering: 95% of performance variance from token-budget + tool-call-count + model-choice in that order. Mixed-tier centralised topologies beat homogeneous fleets.
- "Towards a Science of Scaling Agent Systems" (arXiv 2512.08296) and AI Sweden's Practical Approach to Optimize Multi-Agent Systems (Dec 2025) give scaling laws for agent count vs accuracy plateau.
- REALM-Bench and MultiAgentBench measure collaboration / competition over 11 real-world planning scenarios.
- LangChain's supervisor-architecture benchmark reports roughly 50% improvement from supervisor implementation fixes alone (prompt + handoff translation).
- SWE-Debate: competitive debate beats consensus debate on software issue resolution.
Selection Criteria¶
| Archetype | Best Problem Class | Cost (Relative) | Governance Ease | Parallelism | Debuggability | Outcome Evidence |
|---|---|---|---|---|---|---|
| Single agent | Serial multi-hop reasoning, tight budgets | 1× | High | None | Highest | arXiv 2604.02460; beats MAS at equal thinking-token budget |
| Supervisor / orchestrator | Open-ended research, decomposable retrieval | ~15× chat | High (single PEP at supervisor) | Medium (with Send-style primitives) |
High (one trace root) | Anthropic: +90.2% over single Opus; 80% variance ≈ token budget |
| Hierarchical (team-of-teams) | Large planning, multi-domain workflows | High | High | Medium-High | Medium (multi-root traces) | Google ADK; supervisor accuracy cliff at 8-12 hops |
| Peer / swarm handoff | Conversational routing, customer support | Low-Medium | Low (distributed policy) | High | Low | LangGraph-Swarm: ~40% latency reduction |
| Debate / consensus | Math, code review, fact verification | High (round-multiplied) | Medium | Medium | Medium | SWE-Debate (competitive) beats consensus MAD; conformity hurts |
| MetaGPT-style SOP | Software dev with known stages | Medium | High (SOP-gated) | Medium | High (stage gates) | 85.9% / 87.7% Pass@1 on code-gen benchmarks |
Picking Heuristic¶
Pick single agent until there is a measurable bottleneck. Move to supervisor when sub-tasks are independent and the budget can absorb roughly 15×. Pick hierarchical only when one supervisor's context starts overflowing past 12 hops. Pick swarm only when latency matters more than auditability. Pick SOP when stages are stable and stage-typed artifacts exist. Pick debate only with a competitive (not conformity) protocol and a verifier that can break ties.
Cross-References¶
- Fabric › Setup; the developer-facing platform layer that consumes these patterns.
- Reference › Research › Subagents; multi-agent compositions Setup wires.
- Reference › Research › Fabric Flow; orchestration patterns Setup configures.