Assembly Tools¶

The tool design patterns referenced throughout Assembly › Tools draw on the 2024-2026 agentic tool-design research lineage. This page catalogs the canonical declaration shapes, validation methods, security postures, and the selection criteria the platform uses when configuring tools for a given Unitt.

Tool Description Quality¶

Anthropic's Writing effective tools for AI agents (October 2025) frames tool descriptions as the highest-leverage knob; small description refinements (correcting Claude appending "2025" to web-search queries) produced larger accuracy gains than swapping model size. Recommendations: write for the agent reader, use distinct names with namespacing, return human-readable fields over raw IDs, cap responses near 25k tokens, evaluate with agentic loops measuring accuracy, runtime, tokens, and errors. Description quality dominates tool-selection accuracy in production.

Function-Calling And Strict Mode¶

OpenAI Structured Outputs (August 2024) introduces strict: true on tools[] and response_format: {type:"json_schema", strict:true} with constrained decoding guaranteeing schema conformance, but requires additionalProperties:false and every field marked required (optional fields use ["T","null"]). Anthropic shipped equivalent Structured Outputs in public beta on 2025-11-14, then GA across the 4.5 / 4.6 / 4.7 line, exposing tools[].strict for type-safe argument synthesis via grammar-compiled constrained decoding. Both vendors are now schema-converged.

CodeAct¶

CodeAct (Wang et al., ICML 2024) showed Python-as-action consolidates the action space, yielding up to 20% absolute task success and roughly 30% fewer steps and tokens vs JSON tool calls because code natively supports loops, conditionals, and variable reuse. It is now the default in Manus, OpenDevin, and Open Interpreter; Anthropic's Code execution with MCP (November 2025) reinforces the pattern for token-efficient MCP usage.

MCP Tools Primitive¶

Tools are the executable primitive of MCP alongside Resources and Prompts; clients enumerate via tools/list and invoke via tools/call, with runtime capability negotiation. The 2025-11-25 specification adds parallel tool calls, deprecates includeContext in favor of explicit capability declarations, and ties discovery to an OAuth 2.1 authorization framework with Protected Resource Metadata + OIDC discovery. Tool lists can be dynamic (changed-notification capability).

Tool Registries¶

Enterprise pattern (mcp-gateway-registry, Red Hat MCP Gateway): a central registry behind an MCP gateway accepting IdP-issued JWTs from Keycloak / Entra / Okta / Cognito / Auth0, session cookies, and service tokens, with group-restricted tool visibility (allowedGroups) layered on top of IAM scopes for per-agent allowlists. OWASP MCP Security Cheat Sheet mandates: treat each server as an independent trust domain, enforce server allowlisting (only signed / registered servers reachable from prod), require OAuth 2.1 + PKCE for remote auth, log every tool call with user / agent / server / policy for audit.

Tool Validation Before Runtime¶

The 2025 consensus (Datadog, LangChain, Braintrust, Atlan six-layer): the agent harness, not the model, is the binding reliability constraint. Validate tools against the exact model that will call them before promotion. A tool-evaluation harness combines deterministic checks (selection correctness, JSON-schema argument validation, format compliance, permission scope) with LLM-as-judge for response quality, plus self-verification hooks that re-prompt on schema failure with the validator's error. Argument-synthesis defects are almost always tool-description or system-prompt problems, not model defects.

Tool Poisoning Defense¶

CVE-2025-54136 MCPoison (disclosed July 2025, Cursor ≤1.2.4) let an attacker swap a previously approved MCP entry for a malicious command with no re-prompt; persistent RCE via the trusted-tool descriptor channel. The structural lesson (Invariant Labs, TrueFoundry): tool descriptions are model-side instructions with ambient authority, so defense lives at the gateway via schema inspection, content-hash pinning of approved descriptors, re-approval on any descriptor diff, provenance signing, and stripping hidden instructions from descriptions before they reach the model.

Tool Versioning¶

Treat tool contracts as public APIs under semver: MAJOR for any removed / renamed / required-arg change, MINOR for additive optional args, PATCH for description / behavior fixes. Industry telemetry attributes ~60% of production agent failures to tool-version churn vs ~40% model drift, driven by silently changing schemas, return shapes, or default values (NJ Raman). Mitigations: pin tool versions per agent build, expose tool_version in audit logs, run contract tests on every release, deprecate via N+1 dual-publish rather than in-place edits.

Unitt Default Tools Mapping¶

Unitt Tool	Pattern Equivalent	Notes
Auth	OAuth 2.1 + PKCE per OWASP MCP	Identity for user-on-behalf and service-to-service.
Comms	MCP transport (HTTP / stdio)	Remote vs local choice lives here.
Model	LLM gateway (Anthropic / OpenAI strict tools)	Pin model + strict-mode flag per agent.
File	MCP filesystem server	Sandbox + path allowlist.
Audit	Gateway log sink	user / agent / server / tool / args / result.
Shell	CodeAct executor (Python / bash sandbox)	Highest blast radius; ephemeral container.
Search	Managed MCP (web / RAG)	Description hygiene per Anthropic guidance.
Memory	MCP Resources + tool wrappers	Versioned schemas.
Scheduler	Cron / tool with idempotency keys	Replay-safe contracts.
Browser	Managed MCP (Playwright / Computer Use)	Strict-mode args mandatory.
Validation	Harness layer (schema + judge)	Pre-prod tool eval.
Sandbox	Container / seccomp boundary	Wraps Shell / CodeAct.
Monitor	Telemetry + drift detection	Catches the 60 / 40 failure mix.

Anatomy Of A Tool¶

The convergent shape across Anthropic Skills, Claude Code sub-agents, Cloudflare Markdown for Agents, and Microsoft Agent Skills: a JSON / YAML declaration (name, semver, JSON-Schema input_schema, output schema, scopes / permissions, allowed bash patterns, model bindings) plus a Markdown usage file (purpose, when-to-use, examples, edge cases, validation / post-conditions) loaded on demand. The two-file split keeps the strict-mode schema mechanically validatable while the prose stays optimizable.

Selection Criteria¶

Dimension	Custom Tool (In-Process)	Managed MCP Server	CodeAct
Shape	JSON function with strict schema	MCP `tools/*` over OAuth	Python (or sandboxed lang) cell
Declaration	YAML + Markdown in repo, semver-pinned	Registry entry, signed, OAuth scope	Tool surface = stdlib + injected SDK
Validation	Schema + unit + eval harness	Gateway schema inspect + contract tests	Sandbox test runner + output assertions
Security Posture	Highest control, in-trust-domain	Untrusted-domain isolation, gateway-mediated	Highest blast radius → mandatory sandbox
Best For	Stable, hot-path, latency-sensitive ops	Shared / 3rd-party capabilities, multi-agent reuse	Long compositional chains, data wrangling
Avoid When	Capability is widely reused	Latency-critical or sensitive secrets	Operation needs strict audited args

Cross-References¶

Assembly › Tools; developer-facing platform layer.
Reference › Research › Assembly Connectors; MCP transport and credential vaulting tools wrap.
Reference › Research › Assembly Skills; Skills versus Tools versus MCP distinction.