Subagents¶
The subagent and multi-agent orchestration patterns referenced throughout Emergence › Subunits draw on the 2024-2026 multi-agent research lineage. This page catalogs the canonical systems, their key innovations, ideal workloads, failure modes, and selection criteria the platform uses when configuring a Subunit composition for a given Unitt.
Reference Systems¶
Claude Code Subagents¶
Claude Code subagents are Markdown files in .claude/agents/*.md with YAML frontmatter (name, description, tools, optional model). Each runs in an isolated context window with its own system prompt and a declarative tool allowlist; parent-to-child communication is a single prompt string, child-to-parent is a summary. Innovation: file-based, version-controlled subagent definitions with least-privilege tool scoping. Best for: noisy, self-contained side tasks (repo exploration, log triage, doc review). Failure mode: loss of nuance across the prompt / summary boundary. Reference: Claude API subagents docs.
Anthropic Multi-Agent Research System¶
The Anthropic multi-agent research system implements an orchestrator-worker pattern. A lead Opus agent decomposes a query, spawns 3-5 Sonnet subagents in parallel, each running 3+ tool calls in parallel. The system outperformed single-agent Opus by 90.2% on internal evaluation; consumed roughly 15× more tokens than chat; roughly 80% of performance variance was explained by token budget. Innovation: explicit scaling rules embedded in prompts so the lead allocates effort to task complexity. Failure mode: over-spawning, duplicate work from vague subtasks. See also: Building Effective Agents.
OpenAI Swarm / Agents SDK¶
OpenAI Swarm introduced two primitives: Agents (instructions + tools) and handoffs (tools that return another Agent, transferring control). The OpenAI Agents SDK is the production successor adding guardrails, tracing, sessions, MCP integration, and TypeScript support. Innovation: handoff-as-tool; control transfer is just a function the model can call; no central router required. Best for: customer-service-style flows with one agent triaging and routing to specialists. Failure mode: with no supervisor, handoff loops and oscillation between peers can occur. See Orchestrating Agents.
LangGraph¶
LangGraph is graph-based orchestration where nodes are agents / functions and conditional edges route based on state. The supervisor pattern and hierarchical teams (subgraphs of supervisors) are well-documented. Routing logic lives in code (conditional edges), not LLM prompts. Innovation: explicit state graph with checkpointing, deterministic routing, and human-in-the-loop interrupts. Best for: production agents needing observable state machines, replay, and complex branching. Failure mode: hierarchical layers add latency and cost; avoid hierarchy until roughly six concurrent workers. Reference: Choosing the Right Multi-Agent Architecture.
CrewAI¶
CrewAI declares agents with role / goal / backstory plus tools, grouped into a Crew. Two process types: Process.sequential (linear pipeline) and Process.hierarchical (manager LLM delegates and validates). A newer consensual mode adds voting. Innovation: persona-first ergonomics; agents resemble job descriptions, lowering authoring friction. Best for: linear content / research pipelines and team-simulation use cases. Failure mode: backstory / role prompts encourage roleplay drift; harder to test deterministically than graph-based systems.
AutoGen / AG2¶
AG2 (formerly AutoGen) provides ConversableAgent, GroupChat with GroupChatManager for >2 agents over a shared transcript with speaker-selection logic, and Nested chats that package a sub-conversation behind a single agent. AG2 v0.9 unified these into one Group Chat architecture. Innovation: conversation as a first-class abstraction; emergent collaboration through message-passing rather than fixed graphs. Best for: coding / reasoning tasks with iterative critique-and-executor loops. Failure mode: speaker-selection drift, runaway transcripts; bound cost with explicit max_round caps.
MetaGPT¶
MetaGPT (ICLR 2024) encodes a software-org SOP: Product Manager → Architect → Project Manager → Engineer → QA Engineer. Agents exchange structured artifacts (PRDs, file lists, interface definitions) rather than free-text, which sharply raises code-gen success on benchmarks. Innovation: structured intermediate outputs as the inter-agent contract, eliminating much hallucination drift. Best for: greenfield software generation from a one-line spec; any pipeline that benefits from a rigid SOP. Failure mode: rigidity; non-software domains and ambiguous specs map poorly to the fixed role chain. Source code: FoundationAgents/MetaGPT.
AgentVerse / ChatDev¶
AgentVerse supports task-solving and social simulation with dynamic role recruitment; ChatDev (ACL 2024) uses a chat chain dividing software development into phased dialogues with "communicative dehallucination." Innovation: role-playing simulation as a research instrument for studying emergent multi-agent behavior. Best for: academic exploration, scenario simulation, design templates for phase-decomposed pipelines. Failure mode: emergent behavior is hard to constrain; production reliability lags supervisor / graph systems.
Coordination Patterns¶
Across the systems above, three coordination patterns recur. Each has a distinct cost, observability, and failure-mode profile.
| Pattern | Description | Strengths | Weaknesses |
|---|---|---|---|
| Supervisor / Orchestrator-Worker | Central LLM decides who runs next. | Clean traces; easy to govern; parallelizes naturally. | Supervisor bottleneck and token sink. |
| Hierarchical Decomposition | Supervisors of supervisors with sub-teams. | Scales to many specialists across sub-domains. | Token cost and latency multiply per layer. |
| Peer Handoffs / Debate | Agents transfer control or critique each other. | Independent critique improves reasoning. | Handoff loops; majority conformity in debate. |
Failure Modes (MAST Taxonomy)¶
The MAST taxonomy analyzed 1,600+ traces across multi-agent systems and identified 14 failure modes clustering into three groups:
- Specification Problems (41.8%); under-specified briefs, ambiguous task contracts.
- Coordination Failures (36.9%); context loss across boundaries, loops, oscillation.
- Verification Gaps (21.3%); silent retries, missing validation of subagent output.
Recent research (arXiv:2511.07784, arXiv:2509.11035) further shows that agents in debate compositions tend to conform to majority rather than reason independently; gains depend on minority agents willing to push back. See also: Multi-Agent Collaboration Mechanisms: A Survey.
Selection Criteria¶
The platform selects a subagent composition per Unitt by reading the workload profile (coordination style, isolation needs, parallelism, cost budget, debug-ability, customization required) and matching against the table below.
| System / Pattern | Coordination | Isolation | Parallelism | Cost | Debug-ability | Customization |
|---|---|---|---|---|---|---|
| Claude Code subagents | Supervisor (parent delegates) | Strong (separate ctx) | Yes (parent spawns) | Low (Haiku-able) | Moderate (summaries only) | High (MD + tool allowlist) |
| Anthropic research system | Orchestrator-worker | Strong | High (3-5 wide) | Very High (~15×) | Hard (parallel traces) | Medium (prompt-tuned) |
| OpenAI Agents SDK | Peer handoffs | Per-agent | Limited | Medium | Good (built-in tracing) | High (Python / TS) |
| LangGraph | Supervisor / graph | Per-node state | Yes (parallel nodes) | Tunable | Strong (state, replay) | Very High (code-defined) |
| CrewAI | Sequential / hierarchical | Per-agent | Limited | Medium | Moderate | High (role personas) |
| AutoGen / AG2 | Group chat | Shared transcript | Limited | High (chatty) | Hard (free-form msgs) | Very High |
| MetaGPT | Fixed SOP pipeline | Per-role, structured artifacts | Sequential | Medium | Good (structured outputs) | Low (role chain rigid) |
| ChatDev / AgentVerse | Chat-chain / simulation | Per-role | Phase-parallel | High | Moderate | High (research-grade) |
| Debate / consensus | Peer | Shared | Round-parallel | High | Hard | Medium |
Picking Heuristic¶
- Single Unitt for short, single-domain workloads.
- Claude Code subagents for context isolation on noisy side-tasks.
- Supervisor / orchestrator as the default for most enterprise workloads.
- LangGraph when explicit state machines, replay, and human-in-the-loop are required.
- CrewAI for linear content pipelines where persona ergonomics dominate.
- AG2 for conversational, critique-heavy reasoning tasks with strict round caps.
- MetaGPT for greenfield software generation with rigid SOPs.
- Hierarchical when there are more than roughly six concurrent specialists.
- Peer debate only when independent critique demonstrably improves outcomes against the workload's success oracle.
Cross-References¶
- Emergence › Subunits; the developer-facing platform layer that consumes these patterns.
- Emergence › System; runtime patterns each subunit can run.
- Emergence › Memory; memory scoping across subunit boundaries.
- Fabric › Setup; extending subunit composition into a coordinated multi-agent fabric.