Skip to content

Assembly Tools

The tool design patterns referenced throughout Assembly › Tools draw on the 2024-2026 agentic tool-design research lineage. This page catalogs the canonical declaration shapes, validation methods, security postures, and the selection criteria the platform uses when configuring tools for a given Unitt.

Tool Description Quality

Anthropic's Writing effective tools for AI agents (October 2025) frames tool descriptions as the highest-leverage knob; small description refinements (correcting Claude appending "2025" to web-search queries) produced larger accuracy gains than swapping model size. Recommendations: write for the agent reader, use distinct names with namespacing, return human-readable fields over raw IDs, cap responses near 25k tokens, evaluate with agentic loops measuring accuracy, runtime, tokens, and errors. Description quality dominates tool-selection accuracy in production.

Function-Calling And Strict Mode

OpenAI Structured Outputs (August 2024) introduces strict: true on tools[] and response_format: {type:"json_schema", strict:true} with constrained decoding guaranteeing schema conformance, but requires additionalProperties:false and every field marked required (optional fields use ["T","null"]). Anthropic shipped equivalent Structured Outputs in public beta on 2025-11-14, then GA across the 4.5 / 4.6 / 4.7 line, exposing tools[].strict for type-safe argument synthesis via grammar-compiled constrained decoding. Both vendors are now schema-converged.

CodeAct

CodeAct (Wang et al., ICML 2024) showed Python-as-action consolidates the action space, yielding up to 20% absolute task success and roughly 30% fewer steps and tokens vs JSON tool calls because code natively supports loops, conditionals, and variable reuse. It is now the default in Manus, OpenDevin, and Open Interpreter; Anthropic's Code execution with MCP (November 2025) reinforces the pattern for token-efficient MCP usage.

MCP Tools Primitive

Tools are the executable primitive of MCP alongside Resources and Prompts; clients enumerate via tools/list and invoke via tools/call, with runtime capability negotiation. The 2025-11-25 specification adds parallel tool calls, deprecates includeContext in favor of explicit capability declarations, and ties discovery to an OAuth 2.1 authorization framework with Protected Resource Metadata + OIDC discovery. Tool lists can be dynamic (changed-notification capability).

Tool Registries

Enterprise pattern (mcp-gateway-registry, Red Hat MCP Gateway): a central registry behind an MCP gateway accepting IdP-issued JWTs from Keycloak / Entra / Okta / Cognito / Auth0, session cookies, and service tokens, with group-restricted tool visibility (allowedGroups) layered on top of IAM scopes for per-agent allowlists. OWASP MCP Security Cheat Sheet mandates: treat each server as an independent trust domain, enforce server allowlisting (only signed / registered servers reachable from prod), require OAuth 2.1 + PKCE for remote auth, log every tool call with user / agent / server / policy for audit.

Tool Validation Before Runtime

The 2025 consensus (Datadog, LangChain, Braintrust, Atlan six-layer): the agent harness, not the model, is the binding reliability constraint. Validate tools against the exact model that will call them before promotion. A tool-evaluation harness combines deterministic checks (selection correctness, JSON-schema argument validation, format compliance, permission scope) with LLM-as-judge for response quality, plus self-verification hooks that re-prompt on schema failure with the validator's error. Argument-synthesis defects are almost always tool-description or system-prompt problems, not model defects.

Tool Poisoning Defense

CVE-2025-54136 MCPoison (disclosed July 2025, Cursor ≤1.2.4) let an attacker swap a previously approved MCP entry for a malicious command with no re-prompt; persistent RCE via the trusted-tool descriptor channel. The structural lesson (Invariant Labs, TrueFoundry): tool descriptions are model-side instructions with ambient authority, so defense lives at the gateway via schema inspection, content-hash pinning of approved descriptors, re-approval on any descriptor diff, provenance signing, and stripping hidden instructions from descriptions before they reach the model.

Tool Versioning

Treat tool contracts as public APIs under semver: MAJOR for any removed / renamed / required-arg change, MINOR for additive optional args, PATCH for description / behavior fixes. Industry telemetry attributes ~60% of production agent failures to tool-version churn vs ~40% model drift, driven by silently changing schemas, return shapes, or default values (NJ Raman). Mitigations: pin tool versions per agent build, expose tool_version in audit logs, run contract tests on every release, deprecate via N+1 dual-publish rather than in-place edits.

Unitt Default Tools Mapping

Unitt Tool Pattern Equivalent Notes
Auth OAuth 2.1 + PKCE per OWASP MCP Identity for user-on-behalf and service-to-service.
Comms MCP transport (HTTP / stdio) Remote vs local choice lives here.
Model LLM gateway (Anthropic / OpenAI strict tools) Pin model + strict-mode flag per agent.
File MCP filesystem server Sandbox + path allowlist.
Audit Gateway log sink user / agent / server / tool / args / result.
Shell CodeAct executor (Python / bash sandbox) Highest blast radius; ephemeral container.
Search Managed MCP (web / RAG) Description hygiene per Anthropic guidance.
Memory MCP Resources + tool wrappers Versioned schemas.
Scheduler Cron / tool with idempotency keys Replay-safe contracts.
Browser Managed MCP (Playwright / Computer Use) Strict-mode args mandatory.
Validation Harness layer (schema + judge) Pre-prod tool eval.
Sandbox Container / seccomp boundary Wraps Shell / CodeAct.
Monitor Telemetry + drift detection Catches the 60 / 40 failure mix.

Anatomy Of A Tool

The convergent shape across Anthropic Skills, Claude Code sub-agents, Cloudflare Markdown for Agents, and Microsoft Agent Skills: a JSON / YAML declaration (name, semver, JSON-Schema input_schema, output schema, scopes / permissions, allowed bash patterns, model bindings) plus a Markdown usage file (purpose, when-to-use, examples, edge cases, validation / post-conditions) loaded on demand. The two-file split keeps the strict-mode schema mechanically validatable while the prose stays optimizable.

Selection Criteria

Dimension Custom Tool (In-Process) Managed MCP Server CodeAct
Shape JSON function with strict schema MCP tools/* over OAuth Python (or sandboxed lang) cell
Declaration YAML + Markdown in repo, semver-pinned Registry entry, signed, OAuth scope Tool surface = stdlib + injected SDK
Validation Schema + unit + eval harness Gateway schema inspect + contract tests Sandbox test runner + output assertions
Security Posture Highest control, in-trust-domain Untrusted-domain isolation, gateway-mediated Highest blast radius → mandatory sandbox
Best For Stable, hot-path, latency-sensitive ops Shared / 3rd-party capabilities, multi-agent reuse Long compositional chains, data wrangling
Avoid When Capability is widely reused Latency-critical or sensitive secrets Operation needs strict audited args

Cross-References