Skip to content

Assembly Objectives

The objective specification and decomposition patterns referenced throughout Assembly › Objectives draw on the 2024-2026 agentic-planning research lineage. This page catalogs the canonical patterns, their measured outcomes, and the selection criteria the platform uses when configuring a Unitt's Objectives for a given workload.

Primary Objective Specification

The Anthropic multi-agent research system engineering post identifies objective ambiguity as the number-one driver of duplicated work in multi-agent systems; when the lead agent issued vague instructions ("research the semiconductor shortage"), one subagent investigated the 2021 automotive chip crisis while two others duplicated 2025 supply-chain searches. The fix is a per-agent contract containing: objective, output format, tool / source guidance, and clear task boundaries. Modern guidance favors outcome-shaped objectives (a verifiable end state) over task-shaped objectives (an imperative procedure), because outcomes survive replanning while procedures break under tool drift.

Sub-Objective Decomposition

The dominant 2024-2026 patterns:

  • Plan-and-Solve / PS+ prompting (Wang et al., ACL 2023) prepends "devise a plan, then carry it out" to reduce missing-step errors.
  • Plan-and-Execute (LangGraph) separates a Planner node from Executor nodes via a conditional edge; faster than ReAct because executors do not re-consult the large planner per step.
  • HTN / ChatHTN (Muñoz-Avila 2025) keeps a symbolic HTN planner and queries the LLM only when no decomposition method applies, retaining soundness.
  • MetaGPT encodes SOPs as assembly-line role prompts (PM → Architect → PM → Engineer) with structured handovers (PRDs, file lists, interface definitions) to eliminate idle chatter.

AgentBoard (NeurIPS 2024 Oral) operationalizes evaluation of decomposition quality via a progress rate metric, validated at Pearson > 0.95 vs human raters.

Outcome Definitions

Modern outcome oracles are tiered: deterministic scorers (regex, JSON-schema validation, state-hash, unit tests, exact match) where applicable, and LLM-as-Judge only for subjective dimensions. The CANDOR multi-agent JUnit framework (Hallucination-to-Consensus, 2025) shows specialized agents generating oracles in predicate logic for verifiable conditions. τ-bench (Sierra, ICLR 2025) added the pass^k metric (probability all k consecutive trials succeed); GPT-4o-class agents drop to pass^8 < 25% in retail. Cost-per-success is now a release metric: τ-bench plots a cost-vs-accuracy Pareto frontier.

Constraint Specification

The OWASP Top 10 for Agentic Applications (2026 release, December 2025) elevates Excessive Agency as a core threat. Mitigations: Least-Agency (extension of PoLP; minimum autonomy to complete the task), per-tool profiles restricting permissions / data / functionality, explicit human confirmation for sensitive tools, isolated execution environments with enforced network policies, identity isolation plus memory erasure between tasks, and a central policy engine that checks every sensitive action.

Constraints in an Objective spec should declare: never-do list, approval-required actions, allowed-tools / systems whitelist, token / dollar budget, wall-clock stop, and reversibility requirement.

Confidence Thresholds

Confidence-based escalation uses tiered thresholds: high-confidence auto-approve, medium auto-approve-with-sampling, low → human queue, very-low → alert. The pass^k reliability framing lets ops teams forecast escalation volume from p^k decay. 2025 calibration work (ACM Computing Surveys on UQ in LLMs) favors Bayesian posterior estimation with Dirichlet / Beta priors over raw token-logprobs.

Sub-Objective Dependency Graphs

The 2025 consensus is DAG, not tree: nodes are sub-objectives, edges encode output-to-input data dependency. The Coordinator-Implementor-Verifier pattern (Augment Code) runs same-level DAG nodes as parallel waves. Graph Harness makes three commitments: (1) plan is immutable for the plan version, (2) planning / execution / recovery are separated layers, (3) recovery escalates retry → local patch → replan to prevent the replan-loop pathology. GraSP adds five local-repair primitives (Rebind, InsertPrereq, Substitute, Rewire, Bypass) before allowing global replan. TDP (Task-Decoupled Planning, 2025) confines replanning to active sub-tasks, cutting tokens up to 82%.

Validation Checkpoints

Production patterns wrap every sub-objective edge in a gate. Deterministic gates run schema validation, JSON-parse, tool-call format, output length bounds, and policy checks; semantic gates use a separate validator agent with explicit fail-the-work authority. LangGraph 2.0's persistence / checkpoint layer is the reference for resumable multi-day work; Anthropic's harness guidance documents progress files, git-as-state, and structured session handoff artifacts.

OKR-For-Agents

OKR-Agent is the canonical reference: hierarchical Objects decompose into sub-Objects, each assigned to a fresh agent; Key Results drive multi-level self-evaluation. Enterprise practice (ISG 2025 measurement framework) pairs function-specific OKRs with OODA-loop KPIs. KR design rules: Objectives are qualitative / timebound / inspiring; KRs are specific, measurable, ambitious, and machine-checkable.

Validation Of Objectives

Before execution: (a) dry-run plan generation produces the DAG without side effects and lints for unreachable nodes, missing post-conditions, and tool-permission gaps; (b) cost-ceiling estimation sums expected tokens / tool calls per node; (c) simulation / replay against recorded traces or a stateful world model (WorldSim-style engines are 2025 reference implementations); (d) policy dry-evaluation against the OWASP policy engine to flag would-be excessive-agency steps. Reject the spec at submit time if any check fails.

Selection Criteria

Task Profile Objective Shape Decomposition Strategy Validation Approach
Short, single-tool, deterministic Single outcome-shaped None Deterministic oracle (schema / regex / hash)
Multi-step reasoning, one agent Single outcome Plan-and-Solve (PS+) prompt Final-state oracle + self-check
Long-horizon, mixed tools Outcome + KRs Plan-and-Execute (LangGraph) Per-step deterministic gate, replay
Domain SOP (e.g., SDLC) Outcome per role MetaGPT SOP staging Structured handover artifact validation
Parallelizable, data-flow heavy Outcome with KR DAG DAG / Graph Harness, wave execution Pre / post-condition gate per node
Symbolic / verifiable domain Outcome + formal post-conditions HTN / ChatHTN Symbolic proof + LLM-as-Judge fallback
High-risk / non-reversible Outcome + tight constraints Decomposed with approval gates Policy engine + HITL per sensitive node

Selection Heuristic

Use a single objective for short-horizon, single-tool, deterministic-oracle tasks; use decomposed sub-objectives when horizon > a few tool calls, when multiple specialists are needed, when parallelism is available, or when intermediate artifacts must be inspected. Granularity guidance: 3-5 sub-objectives for coarse plans, 8-10 for detailed plans; each sub-objective should be a single actionable unit completable in roughly one tool-call cluster.

Cross-References