Subagents¶

The subagent and multi-agent orchestration patterns referenced throughout Emergence › Subunits draw on the 2024-2026 multi-agent research lineage. This page catalogs the canonical systems, their key innovations, ideal workloads, failure modes, and selection criteria the platform uses when configuring a Subunit composition for a given Unitt.

Reference Systems¶

Claude Code Subagents¶

Claude Code subagents are Markdown files in .claude/agents/*.md with YAML frontmatter (name, description, tools, optional model). Each runs in an isolated context window with its own system prompt and a declarative tool allowlist; parent-to-child communication is a single prompt string, child-to-parent is a summary. Innovation: file-based, version-controlled subagent definitions with least-privilege tool scoping. Best for: noisy, self-contained side tasks (repo exploration, log triage, doc review). Failure mode: loss of nuance across the prompt / summary boundary. Reference: Claude API subagents docs.

Anthropic Multi-Agent Research System¶

The Anthropic multi-agent research system implements an orchestrator-worker pattern. A lead Opus agent decomposes a query, spawns 3-5 Sonnet subagents in parallel, each running 3+ tool calls in parallel. The system outperformed single-agent Opus by 90.2% on internal evaluation; consumed roughly 15× more tokens than chat; roughly 80% of performance variance was explained by token budget. Innovation: explicit scaling rules embedded in prompts so the lead allocates effort to task complexity. Failure mode: over-spawning, duplicate work from vague subtasks. See also: Building Effective Agents.

OpenAI Swarm / Agents SDK¶

OpenAI Swarm introduced two primitives: Agents (instructions + tools) and handoffs (tools that return another Agent, transferring control). The OpenAI Agents SDK is the production successor adding guardrails, tracing, sessions, MCP integration, and TypeScript support. Innovation: handoff-as-tool; control transfer is just a function the model can call; no central router required. Best for: customer-service-style flows with one agent triaging and routing to specialists. Failure mode: with no supervisor, handoff loops and oscillation between peers can occur. See Orchestrating Agents.

LangGraph¶

LangGraph is graph-based orchestration where nodes are agents / functions and conditional edges route based on state. The supervisor pattern and hierarchical teams (subgraphs of supervisors) are well-documented. Routing logic lives in code (conditional edges), not LLM prompts. Innovation: explicit state graph with checkpointing, deterministic routing, and human-in-the-loop interrupts. Best for: production agents needing observable state machines, replay, and complex branching. Failure mode: hierarchical layers add latency and cost; avoid hierarchy until roughly six concurrent workers. Reference: Choosing the Right Multi-Agent Architecture.

CrewAI¶

CrewAI declares agents with role / goal / backstory plus tools, grouped into a Crew. Two process types: Process.sequential (linear pipeline) and Process.hierarchical (manager LLM delegates and validates). A newer consensual mode adds voting. Innovation: persona-first ergonomics; agents resemble job descriptions, lowering authoring friction. Best for: linear content / research pipelines and team-simulation use cases. Failure mode: backstory / role prompts encourage roleplay drift; harder to test deterministically than graph-based systems.

AutoGen / AG2¶

AG2 (formerly AutoGen) provides ConversableAgent, GroupChat with GroupChatManager for >2 agents over a shared transcript with speaker-selection logic, and Nested chats that package a sub-conversation behind a single agent. AG2 v0.9 unified these into one Group Chat architecture. Innovation: conversation as a first-class abstraction; emergent collaboration through message-passing rather than fixed graphs. Best for: coding / reasoning tasks with iterative critique-and-executor loops. Failure mode: speaker-selection drift, runaway transcripts; bound cost with explicit max_round caps.

MetaGPT¶

MetaGPT (ICLR 2024) encodes a software-org SOP: Product Manager → Architect → Project Manager → Engineer → QA Engineer. Agents exchange structured artifacts (PRDs, file lists, interface definitions) rather than free-text, which sharply raises code-gen success on benchmarks. Innovation: structured intermediate outputs as the inter-agent contract, eliminating much hallucination drift. Best for: greenfield software generation from a one-line spec; any pipeline that benefits from a rigid SOP. Failure mode: rigidity; non-software domains and ambiguous specs map poorly to the fixed role chain. Source code: FoundationAgents/MetaGPT.

AgentVerse / ChatDev¶

AgentVerse supports task-solving and social simulation with dynamic role recruitment; ChatDev (ACL 2024) uses a chat chain dividing software development into phased dialogues with "communicative dehallucination." Innovation: role-playing simulation as a research instrument for studying emergent multi-agent behavior. Best for: academic exploration, scenario simulation, design templates for phase-decomposed pipelines. Failure mode: emergent behavior is hard to constrain; production reliability lags supervisor / graph systems.

Coordination Patterns¶

Across the systems above, three coordination patterns recur. Each has a distinct cost, observability, and failure-mode profile.

Pattern	Description	Strengths	Weaknesses
Supervisor / Orchestrator-Worker	Central LLM decides who runs next.	Clean traces; easy to govern; parallelizes naturally.	Supervisor bottleneck and token sink.
Hierarchical Decomposition	Supervisors of supervisors with sub-teams.	Scales to many specialists across sub-domains.	Token cost and latency multiply per layer.
Peer Handoffs / Debate	Agents transfer control or critique each other.	Independent critique improves reasoning.	Handoff loops; majority conformity in debate.

Failure Modes (MAST Taxonomy)¶

The MAST taxonomy analyzed 1,600+ traces across multi-agent systems and identified 14 failure modes clustering into three groups:

Specification Problems (41.8%); under-specified briefs, ambiguous task contracts.
Coordination Failures (36.9%); context loss across boundaries, loops, oscillation.
Verification Gaps (21.3%); silent retries, missing validation of subagent output.

Recent research (arXiv:2511.07784, arXiv:2509.11035) further shows that agents in debate compositions tend to conform to majority rather than reason independently; gains depend on minority agents willing to push back. See also: Multi-Agent Collaboration Mechanisms: A Survey.

Selection Criteria¶

The platform selects a subagent composition per Unitt by reading the workload profile (coordination style, isolation needs, parallelism, cost budget, debug-ability, customization required) and matching against the table below.

System / Pattern	Coordination	Isolation	Parallelism	Cost	Debug-ability	Customization
Claude Code subagents	Supervisor (parent delegates)	Strong (separate ctx)	Yes (parent spawns)	Low (Haiku-able)	Moderate (summaries only)	High (MD + tool allowlist)
Anthropic research system	Orchestrator-worker	Strong	High (3-5 wide)	Very High (~15×)	Hard (parallel traces)	Medium (prompt-tuned)
OpenAI Agents SDK	Peer handoffs	Per-agent	Limited	Medium	Good (built-in tracing)	High (Python / TS)
LangGraph	Supervisor / graph	Per-node state	Yes (parallel nodes)	Tunable	Strong (state, replay)	Very High (code-defined)
CrewAI	Sequential / hierarchical	Per-agent	Limited	Medium	Moderate	High (role personas)
AutoGen / AG2	Group chat	Shared transcript	Limited	High (chatty)	Hard (free-form msgs)	Very High
MetaGPT	Fixed SOP pipeline	Per-role, structured artifacts	Sequential	Medium	Good (structured outputs)	Low (role chain rigid)
ChatDev / AgentVerse	Chat-chain / simulation	Per-role	Phase-parallel	High	Moderate	High (research-grade)
Debate / consensus	Peer	Shared	Round-parallel	High	Hard	Medium

Picking Heuristic¶

Single Unitt for short, single-domain workloads.
Claude Code subagents for context isolation on noisy side-tasks.
Supervisor / orchestrator as the default for most enterprise workloads.
LangGraph when explicit state machines, replay, and human-in-the-loop are required.
CrewAI for linear content pipelines where persona ergonomics dominate.
AG2 for conversational, critique-heavy reasoning tasks with strict round caps.
MetaGPT for greenfield software generation with rigid SOPs.
Hierarchical when there are more than roughly six concurrent specialists.
Peer debate only when independent critique demonstrably improves outcomes against the workload's success oracle.

Cross-References¶

Emergence › Subunits; the developer-facing platform layer that consumes these patterns.
Emergence › System; runtime patterns each subunit can run.
Emergence › Memory; memory scoping across subunit boundaries.
Fabric › Setup; extending subunit composition into a coordinated multi-agent fabric.