Assembly Tools¶
The tool design patterns referenced throughout Assembly › Tools draw on the 2024-2026 agentic tool-design research lineage. This page catalogs the canonical declaration shapes, validation methods, security postures, and the selection criteria the platform uses when configuring tools for a given Unitt.
Tool Description Quality¶
Anthropic's Writing effective tools for AI agents (October 2025) frames tool descriptions as the highest-leverage knob; small description refinements (correcting Claude appending "2025" to web-search queries) produced larger accuracy gains than swapping model size. Recommendations: write for the agent reader, use distinct names with namespacing, return human-readable fields over raw IDs, cap responses near 25k tokens, evaluate with agentic loops measuring accuracy, runtime, tokens, and errors. Description quality dominates tool-selection accuracy in production.
Function-Calling And Strict Mode¶
OpenAI Structured Outputs (August 2024) introduces strict: true on tools[] and response_format: {type:"json_schema", strict:true} with constrained decoding guaranteeing schema conformance, but requires additionalProperties:false and every field marked required (optional fields use ["T","null"]). Anthropic shipped equivalent Structured Outputs in public beta on 2025-11-14, then GA across the 4.5 / 4.6 / 4.7 line, exposing tools[].strict for type-safe argument synthesis via grammar-compiled constrained decoding. Both vendors are now schema-converged.
CodeAct¶
CodeAct (Wang et al., ICML 2024) showed Python-as-action consolidates the action space, yielding up to 20% absolute task success and roughly 30% fewer steps and tokens vs JSON tool calls because code natively supports loops, conditionals, and variable reuse. It is now the default in Manus, OpenDevin, and Open Interpreter; Anthropic's Code execution with MCP (November 2025) reinforces the pattern for token-efficient MCP usage.
MCP Tools Primitive¶
Tools are the executable primitive of MCP alongside Resources and Prompts; clients enumerate via tools/list and invoke via tools/call, with runtime capability negotiation. The 2025-11-25 specification adds parallel tool calls, deprecates includeContext in favor of explicit capability declarations, and ties discovery to an OAuth 2.1 authorization framework with Protected Resource Metadata + OIDC discovery. Tool lists can be dynamic (changed-notification capability).
Tool Registries¶
Enterprise pattern (mcp-gateway-registry, Red Hat MCP Gateway): a central registry behind an MCP gateway accepting IdP-issued JWTs from Keycloak / Entra / Okta / Cognito / Auth0, session cookies, and service tokens, with group-restricted tool visibility (allowedGroups) layered on top of IAM scopes for per-agent allowlists. OWASP MCP Security Cheat Sheet mandates: treat each server as an independent trust domain, enforce server allowlisting (only signed / registered servers reachable from prod), require OAuth 2.1 + PKCE for remote auth, log every tool call with user / agent / server / policy for audit.
Tool Validation Before Runtime¶
The 2025 consensus (Datadog, LangChain, Braintrust, Atlan six-layer): the agent harness, not the model, is the binding reliability constraint. Validate tools against the exact model that will call them before promotion. A tool-evaluation harness combines deterministic checks (selection correctness, JSON-schema argument validation, format compliance, permission scope) with LLM-as-judge for response quality, plus self-verification hooks that re-prompt on schema failure with the validator's error. Argument-synthesis defects are almost always tool-description or system-prompt problems, not model defects.
Tool Poisoning Defense¶
CVE-2025-54136 MCPoison (disclosed July 2025, Cursor ≤1.2.4) let an attacker swap a previously approved MCP entry for a malicious command with no re-prompt; persistent RCE via the trusted-tool descriptor channel. The structural lesson (Invariant Labs, TrueFoundry): tool descriptions are model-side instructions with ambient authority, so defense lives at the gateway via schema inspection, content-hash pinning of approved descriptors, re-approval on any descriptor diff, provenance signing, and stripping hidden instructions from descriptions before they reach the model.
Tool Versioning¶
Treat tool contracts as public APIs under semver: MAJOR for any removed / renamed / required-arg change, MINOR for additive optional args, PATCH for description / behavior fixes. Industry telemetry attributes ~60% of production agent failures to tool-version churn vs ~40% model drift, driven by silently changing schemas, return shapes, or default values (NJ Raman). Mitigations: pin tool versions per agent build, expose tool_version in audit logs, run contract tests on every release, deprecate via N+1 dual-publish rather than in-place edits.
Unitt Default Tools Mapping¶
| Unitt Tool | Pattern Equivalent | Notes |
|---|---|---|
| Auth | OAuth 2.1 + PKCE per OWASP MCP | Identity for user-on-behalf and service-to-service. |
| Comms | MCP transport (HTTP / stdio) | Remote vs local choice lives here. |
| Model | LLM gateway (Anthropic / OpenAI strict tools) | Pin model + strict-mode flag per agent. |
| File | MCP filesystem server | Sandbox + path allowlist. |
| Audit | Gateway log sink | user / agent / server / tool / args / result. |
| Shell | CodeAct executor (Python / bash sandbox) | Highest blast radius; ephemeral container. |
| Search | Managed MCP (web / RAG) | Description hygiene per Anthropic guidance. |
| Memory | MCP Resources + tool wrappers | Versioned schemas. |
| Scheduler | Cron / tool with idempotency keys | Replay-safe contracts. |
| Browser | Managed MCP (Playwright / Computer Use) | Strict-mode args mandatory. |
| Validation | Harness layer (schema + judge) | Pre-prod tool eval. |
| Sandbox | Container / seccomp boundary | Wraps Shell / CodeAct. |
| Monitor | Telemetry + drift detection | Catches the 60 / 40 failure mix. |
Anatomy Of A Tool¶
The convergent shape across Anthropic Skills, Claude Code sub-agents, Cloudflare Markdown for Agents, and Microsoft Agent Skills: a JSON / YAML declaration (name, semver, JSON-Schema input_schema, output schema, scopes / permissions, allowed bash patterns, model bindings) plus a Markdown usage file (purpose, when-to-use, examples, edge cases, validation / post-conditions) loaded on demand. The two-file split keeps the strict-mode schema mechanically validatable while the prose stays optimizable.
Selection Criteria¶
| Dimension | Custom Tool (In-Process) | Managed MCP Server | CodeAct |
|---|---|---|---|
| Shape | JSON function with strict schema | MCP tools/* over OAuth |
Python (or sandboxed lang) cell |
| Declaration | YAML + Markdown in repo, semver-pinned | Registry entry, signed, OAuth scope | Tool surface = stdlib + injected SDK |
| Validation | Schema + unit + eval harness | Gateway schema inspect + contract tests | Sandbox test runner + output assertions |
| Security Posture | Highest control, in-trust-domain | Untrusted-domain isolation, gateway-mediated | Highest blast radius → mandatory sandbox |
| Best For | Stable, hot-path, latency-sensitive ops | Shared / 3rd-party capabilities, multi-agent reuse | Long compositional chains, data wrangling |
| Avoid When | Capability is widely reused | Latency-critical or sensitive secrets | Operation needs strict audited args |
Cross-References¶
- Assembly › Tools; developer-facing platform layer.
- Reference › Research › Assembly Connectors; MCP transport and credential vaulting tools wrap.
- Reference › Research › Assembly Skills; Skills versus Tools versus MCP distinction.