Tools¶
Tools define the executable capabilities that an agent can use during runtime execution. A tool may be a local binary, CLI application, uploaded executable, GitHub-based utility, API wrapper, script, or other buildable runtime dependency that allows the agent to interact with systems, process data, perform automation, or execute specialized tasks. Tools act as controlled execution interfaces between the runtime and external functionality.
Tools are informed by the active agentic tool-design research lineage, including Anthropic Writing Effective Tools For AI Agents (which frames tool descriptions as the highest-leverage knob, often a larger lever than model size), Anthropic Structured Outputs (GA across the 4.5 / 4.6 / 4.7 line) and OpenAI Structured Outputs with grammar-constrained sampling, CodeAct (Wang et al., ICML 2024) which delivers up to 20% absolute task lift and roughly 30% fewer steps versus JSON tool calls, the MCP Tools primitive, the MCP Gateway & Registry pattern with OAuth scoping (Keycloak / Entra / Okta), the OWASP MCP Security Cheat Sheet, tool-poisoning mitigations after CVE-2025-54136, and tool-evaluation harness research from Datadog and LangChain. Selection criteria for tool design are documented in Reference › Research › Assembly Tools.
Default Core Tools¶
Auth¶
Interacts with the internal authentication API to retrieve scoped sessions, credentials, and vault-authorized access tokens required during runtime execution.
Comms¶
Interacts with the communication API to bridge the runtime to external systems, messaging layers, workflows, operators, and notification systems.
Model¶
Acts as the gateway to individual LLM providers and model runtimes, allowing the agent to perform reasoning, generation, validation, and structured execution tasks.
File¶
Provides controlled local file system operations such as search, read, write, edit, and validation functions required to maintain runtime state, configuration files, logs, and local workflows.
Audit¶
Stores, validates, and reviews internal runtime patterns, execution traces, governance checks, and behavioral consistency signals used to ensure the agent remains aligned during execution.
Additional Common Runtime Tools¶
Shell¶
Executes approved CLI commands and local automation tasks within controlled runtime boundaries.
Search¶
Performs indexed local or remote search operations across files, memory, APIs, documentation, or connected systems.
Memory¶
Provides access to runtime memory systems, context retrieval, embeddings, summaries, and state persistence layers.
Schedular¶
Schedules, queues, retries, and monitors runtime jobs, workflow stages, delayed actions, and timed execution tasks within controlled operational boundaries.
Browser¶
Provides controlled web interaction capabilities for browsing, scraping, validation, research, or workflow automation tasks.
Validation¶
Performs schema checks, output validation, policy verification, confidence scoring, and execution integrity checks before actions occur.
Sandbox¶
Provides controlled access to sandbox environments where the agent can inspect, test, run, or operate isolated workloads without affecting production systems. Sandbox usage should remain scoped to approved environments, defined permissions, temporary files, runtime validation, and safe execution boundaries.
Monitor¶
Collects runtime telemetry, logs, execution metrics, health states, budget usage, and operational diagnostics across workflows and connectors.
Tool Description Quality¶
Anthropic's published engineering guidance frames tool descriptions as the highest-leverage knob in agent reliability; small description refinements often produce larger accuracy gains than swapping model size. The platform writes tool descriptions for the agent reader: distinct namespaced names, human-readable return fields rather than raw IDs, responses capped near 25k tokens, and evaluation through agentic loops that measure accuracy, runtime, tokens, and error rates. Description quality dominates tool-selection accuracy in production.
flowchart LR
DSC[Tool Description] --> SEL[Tool Selection Accuracy]
SEL --> ARG[Argument Synthesis]
ARG --> EX[Execution]
EX --> RES[Result]
RES -. measure .-> EVL[Eval Loop: Accuracy / Runtime / Tokens / Errors]
EVL -. refine .-> DSC
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class DSC,SEL,ARG,EX,RES,EVL stage
Anatomy Of A Tool¶
A tool is defined by two primary files: a JSON / YAML declaration and a markdown usage file. The JSON declaration initializes the tool inside the system and defines its status, commands, inputs, outputs, permissions, and programmatic execution rules. The markdown file is attached to the tool binary and explains when, how, where, and why the agent should use the tool during runtime execution.
flowchart LR
T[Tool] --> J[tool.yaml]
T --> M[TOOL.md]
J --> NAME[name + semver]
J --> IS[input_schema strict mode]
J --> OS[output_schema]
J --> PERM[scopes + permissions]
J --> MB[model bindings]
M --> WHEN[when to use]
M --> WHY[why and edge cases]
M --> EX[examples + post-conditions]
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class T,J,M,NAME,IS,OS,PERM,MB,WHEN,WHY,EX stage
Tool Declaration¶
The JSON declaration is the structured runtime definition for the tool. It tells the system whether the tool is active, what commands are available, what arguments are required, what permissions are needed, and how the tool should be called safely. This file is used by the runtime to register, validate, expose, and control tool access. Argument schemas use grammar-constrained sampling (Anthropic Structured Outputs tools[].strict and OpenAI strict: true on tool definitions) so JSON Schema conformance is a mathematical guarantee rather than prompt prayer.
Tool Usage Markdown¶
The markdown file explains the operational intent of the tool in plain language. It describes when the tool should be used, when it should not be used, expected inputs, expected outputs, failure behavior, safety limits, and examples of correct usage. This file helps the agent understand the tool's purpose beyond the raw command schema.
Tool Validation¶
Tools should be validated before runtime use by sending a test tool call to the same LLM model the agent will use. This allows the user to see how the model interprets the tool, what information it returns, what arguments it attempts to provide, and whether the tool instructions are clear enough for safe execution. Validation should log the model request, selected tool, generated arguments, returned output, errors, permissions used, and any missing information required before the tool can be safely enabled.
flowchart LR
AUT[Author: tool.yaml + TOOL.md] --> SCH[JSON Schema Lint]
SCH --> SCT[Strict-Mode Contract Test]
SCT --> HARN[Tool-Eval Harness]
HARN --> SC[Selection Correctness]
HARN --> AS[Argument Synthesis]
HARN --> OF[Output Format]
HARN --> PG[Permission Scope]
SC --> JDG[LLM-As-Judge Quality]
AS --> JDG
OF --> JDG
PG --> JDG
JDG -->|fail| LOOP[Refine Description / Schema]
LOOP --> SCH
JDG -->|pass| SIGN[Sign Descriptor + Pin Semver]
SIGN --> REG[Publish To Registry]
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class AUT,SCH,SCT,HARN,SC,AS,OF,PG,JDG,LOOP,SIGN,REG stage
MCP Tools Primitive¶
The MCP Tools primitive is the platform's preferred declaration shape. Clients enumerate tools via tools/list and invoke them via tools/call, with runtime capability negotiation so an agent can use a previously unseen server. The 2025-11-25 specification added parallel tool calls, deprecated includeContext in favor of explicit capability declarations, and tied discovery to an OAuth 2.1 authorization framework with Protected Resource Metadata and OIDC discovery. Tool lists can be dynamic (changed-notification capability) so registries can hot-swap tools without disrupting active sessions.
CodeAct As An Alternative Action Shape¶
CodeAct (Wang et al., ICML 2024) showed Python-as-action consolidates the action space, yielding up to 20% absolute task lift and roughly 30% fewer steps and tokens versus JSON tool calls because code natively supports loops, conditionals, and variable reuse to compose tools in one turn. It is the default execution shape in Manus, OpenDevin, and Open Interpreter; Anthropic's November 2025 Code execution with MCP post reinforces the pattern for token-efficient MCP usage. The platform supports CodeAct as an opt-in execution channel for Unitts where chains are long or compositional; CodeAct calls run inside the Sandbox tool with strict resource boundaries.
Tool Registry And Allowlist¶
Each agent's tool surface is governed by a registry-backed allowlist. The pattern is a central MCP Gateway and Registry that accepts IdP-issued JWTs from Keycloak, Entra, Okta, Cognito, or Auth0 plus session cookies and service tokens, with group-restricted tool visibility layered on top of IAM scopes for per-agent allowlists. The OWASP MCP Security Cheat Sheet mandates that each server is treated as an independent trust domain, that server allowlisting is enforced (only signed and registered servers are reachable from production), and that every tool call is logged with user, agent, server, and policy attached for audit.
flowchart LR
BOOT[Agent Boot] --> IDP[Identity: Keycloak / Entra]
IDP --> GW[MCP Gateway]
GW --> ALW[Resolve Allowlist by Group + Semver]
ALW --> LIST[tools/list filtered]
LIST --> SEL[Model Selects Tool]
SEL --> ARG[Strict-Mode Argument Synthesis]
ARG --> INS[Gateway Schema Inspection]
INS --> HSH[Descriptor Hash Check]
HSH -->|mismatch| BLK[Block + Audit]
HSH -->|ok| PDP[Policy Decision]
PDP -->|deny| BLK
PDP -->|allow| EX[Execute]
EX --> AUD[Audit Sink]
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class BOOT,IDP,GW,ALW,LIST,SEL,ARG,INS,HSH,PDP,EX,AUD,BLK stage
Tool Poisoning Defense¶
The CVE-2025-54136 MCPoison disclosure (July 2025, Cursor versions through 1.2.4) demonstrated that an attacker can swap a previously approved MCP entry for a malicious command with no re-prompt; persistent remote code execution through the trusted-tool descriptor channel. The structural defense lives at the gateway: schema inspection, content-hash pinning of approved descriptors, re-approval on any descriptor diff, provenance signing, and stripping hidden instructions from descriptions before they reach the model. The platform applies all five mitigations by default.
Tool Versioning¶
Treat tool contracts as public APIs under semver: MAJOR for any removed / renamed / required-argument change, MINOR for additive optional arguments, PATCH for description or behavior fixes. Industry telemetry attributes roughly 60% of production agent failures to tool-version churn (versus roughly 40% from model drift), driven by silently changing schemas, return shapes, or default values. Mitigations: pin tool versions per agent build, expose tool_version in audit logs, run contract tests on every release, and deprecate via N+1 dual-publish rather than in-place edits.
Tool Management¶
The Tools page allows developers to upload custom binaries, register local executables, or import GitHub libraries and CLI-based utilities that can be built and attached to the runtime. Each tool should define its purpose, execution method, permissions, required connectors, usage constraints, and optional tool-use documentation that explains how the agent is expected to interact with it during execution.
Runtime Philosophy¶
Agents should begin with only the minimum tools required to accomplish their primary objective. Additional tools should only be added as workflows expand or operational requirements evolve, helping reduce unnecessary complexity, permission exposure, runtime instability, and uncontrolled execution behavior. Tools may also be replaced, versioned, or removed over time as workflows, policies, or runtime objectives change.
Selection Heuristic¶
| Dimension | Custom Tool (In-Process) | Managed MCP Server | CodeAct |
|---|---|---|---|
| Shape | JSON function with strict schema | MCP tools/* over OAuth |
Python (or sandboxed lang) cell |
| Declaration | YAML + Markdown in repo, semver-pinned | Registry entry, signed, OAuth scope | Tool surface = stdlib + injected SDK |
| Validation | Schema + unit + eval harness | Gateway schema inspect + contract tests | Sandbox test runner + output assertions |
| Security Posture | Highest control, in-trust-domain | Untrusted-domain isolation, gateway-mediated | Highest blast radius → mandatory sandbox |
| Best For | Stable, hot-path, latency-sensitive ops | Shared / 3rd-party capabilities, multi-agent reuse | Long compositional chains, data wrangling |
| Avoid When | Capability is widely reused | Latency-critical or sensitive secrets | Operation needs strict audited arguments |
Cross-References¶
- Core supplies the policies and authorization scopes each tool inherits.
- Connectors provides the credential vaults and connection substrates many tools wrap.
- Skills declares which tool allowlists are bound to each skill activation.
- Fabric › Flow consumes tool calls inside multi-stage protection mechanisms.
- Reference › Research › Assembly Tools documents citations and selection criteria.