Assembly Objectives¶
The objective specification and decomposition patterns referenced throughout Assembly › Objectives draw on the 2024-2026 agentic-planning research lineage. This page catalogs the canonical patterns, their measured outcomes, and the selection criteria the platform uses when configuring a Unitt's Objectives for a given workload.
Primary Objective Specification¶
The Anthropic multi-agent research system engineering post identifies objective ambiguity as the number-one driver of duplicated work in multi-agent systems; when the lead agent issued vague instructions ("research the semiconductor shortage"), one subagent investigated the 2021 automotive chip crisis while two others duplicated 2025 supply-chain searches. The fix is a per-agent contract containing: objective, output format, tool / source guidance, and clear task boundaries. Modern guidance favors outcome-shaped objectives (a verifiable end state) over task-shaped objectives (an imperative procedure), because outcomes survive replanning while procedures break under tool drift.
Sub-Objective Decomposition¶
The dominant 2024-2026 patterns:
- Plan-and-Solve / PS+ prompting (Wang et al., ACL 2023) prepends "devise a plan, then carry it out" to reduce missing-step errors.
- Plan-and-Execute (LangGraph) separates a Planner node from Executor nodes via a conditional edge; faster than ReAct because executors do not re-consult the large planner per step.
- HTN / ChatHTN (Muñoz-Avila 2025) keeps a symbolic HTN planner and queries the LLM only when no decomposition method applies, retaining soundness.
- MetaGPT encodes SOPs as assembly-line role prompts (PM → Architect → PM → Engineer) with structured handovers (PRDs, file lists, interface definitions) to eliminate idle chatter.
AgentBoard (NeurIPS 2024 Oral) operationalizes evaluation of decomposition quality via a progress rate metric, validated at Pearson > 0.95 vs human raters.
Outcome Definitions¶
Modern outcome oracles are tiered: deterministic scorers (regex, JSON-schema validation, state-hash, unit tests, exact match) where applicable, and LLM-as-Judge only for subjective dimensions. The CANDOR multi-agent JUnit framework (Hallucination-to-Consensus, 2025) shows specialized agents generating oracles in predicate logic for verifiable conditions. τ-bench (Sierra, ICLR 2025) added the pass^k metric (probability all k consecutive trials succeed); GPT-4o-class agents drop to pass^8 < 25% in retail. Cost-per-success is now a release metric: τ-bench plots a cost-vs-accuracy Pareto frontier.
Constraint Specification¶
The OWASP Top 10 for Agentic Applications (2026 release, December 2025) elevates Excessive Agency as a core threat. Mitigations: Least-Agency (extension of PoLP; minimum autonomy to complete the task), per-tool profiles restricting permissions / data / functionality, explicit human confirmation for sensitive tools, isolated execution environments with enforced network policies, identity isolation plus memory erasure between tasks, and a central policy engine that checks every sensitive action.
Constraints in an Objective spec should declare: never-do list, approval-required actions, allowed-tools / systems whitelist, token / dollar budget, wall-clock stop, and reversibility requirement.
Confidence Thresholds¶
Confidence-based escalation uses tiered thresholds: high-confidence auto-approve, medium auto-approve-with-sampling, low → human queue, very-low → alert. The pass^k reliability framing lets ops teams forecast escalation volume from p^k decay. 2025 calibration work (ACM Computing Surveys on UQ in LLMs) favors Bayesian posterior estimation with Dirichlet / Beta priors over raw token-logprobs.
Sub-Objective Dependency Graphs¶
The 2025 consensus is DAG, not tree: nodes are sub-objectives, edges encode output-to-input data dependency. The Coordinator-Implementor-Verifier pattern (Augment Code) runs same-level DAG nodes as parallel waves. Graph Harness makes three commitments: (1) plan is immutable for the plan version, (2) planning / execution / recovery are separated layers, (3) recovery escalates retry → local patch → replan to prevent the replan-loop pathology. GraSP adds five local-repair primitives (Rebind, InsertPrereq, Substitute, Rewire, Bypass) before allowing global replan. TDP (Task-Decoupled Planning, 2025) confines replanning to active sub-tasks, cutting tokens up to 82%.
Validation Checkpoints¶
Production patterns wrap every sub-objective edge in a gate. Deterministic gates run schema validation, JSON-parse, tool-call format, output length bounds, and policy checks; semantic gates use a separate validator agent with explicit fail-the-work authority. LangGraph 2.0's persistence / checkpoint layer is the reference for resumable multi-day work; Anthropic's harness guidance documents progress files, git-as-state, and structured session handoff artifacts.
OKR-For-Agents¶
OKR-Agent is the canonical reference: hierarchical Objects decompose into sub-Objects, each assigned to a fresh agent; Key Results drive multi-level self-evaluation. Enterprise practice (ISG 2025 measurement framework) pairs function-specific OKRs with OODA-loop KPIs. KR design rules: Objectives are qualitative / timebound / inspiring; KRs are specific, measurable, ambitious, and machine-checkable.
Validation Of Objectives¶
Before execution: (a) dry-run plan generation produces the DAG without side effects and lints for unreachable nodes, missing post-conditions, and tool-permission gaps; (b) cost-ceiling estimation sums expected tokens / tool calls per node; (c) simulation / replay against recorded traces or a stateful world model (WorldSim-style engines are 2025 reference implementations); (d) policy dry-evaluation against the OWASP policy engine to flag would-be excessive-agency steps. Reject the spec at submit time if any check fails.
Selection Criteria¶
| Task Profile | Objective Shape | Decomposition Strategy | Validation Approach |
|---|---|---|---|
| Short, single-tool, deterministic | Single outcome-shaped | None | Deterministic oracle (schema / regex / hash) |
| Multi-step reasoning, one agent | Single outcome | Plan-and-Solve (PS+) prompt | Final-state oracle + self-check |
| Long-horizon, mixed tools | Outcome + KRs | Plan-and-Execute (LangGraph) | Per-step deterministic gate, replay |
| Domain SOP (e.g., SDLC) | Outcome per role | MetaGPT SOP staging | Structured handover artifact validation |
| Parallelizable, data-flow heavy | Outcome with KR DAG | DAG / Graph Harness, wave execution | Pre / post-condition gate per node |
| Symbolic / verifiable domain | Outcome + formal post-conditions | HTN / ChatHTN | Symbolic proof + LLM-as-Judge fallback |
| High-risk / non-reversible | Outcome + tight constraints | Decomposed with approval gates | Policy engine + HITL per sensitive node |
Selection Heuristic¶
Use a single objective for short-horizon, single-tool, deterministic-oracle tasks; use decomposed sub-objectives when horizon > a few tool calls, when multiple specialists are needed, when parallelism is available, or when intermediate artifacts must be inspected. Granularity guidance: 3-5 sub-objectives for coarse plans, 8-10 for detailed plans; each sub-objective should be a single actionable unit completable in roughly one tool-call cluster.
Cross-References¶
- Assembly › Objectives; developer-facing platform layer.
- Reference › Research › Assembly Patterns; workflow graph the objective is realized in.
- Reference › Research › Fabric Test; release gate that consumes
pass^kand cost-per-success.