Skip to content

Tools

Tools define the executable capabilities that an agent can use during runtime execution. A tool may be a local binary, CLI application, uploaded executable, GitHub-based utility, API wrapper, script, or other buildable runtime dependency that allows the agent to interact with systems, process data, perform automation, or execute specialized tasks. Tools act as controlled execution interfaces between the runtime and external functionality.

Tools are informed by the active agentic tool-design research lineage, including Anthropic Writing Effective Tools For AI Agents (which frames tool descriptions as the highest-leverage knob, often a larger lever than model size), Anthropic Structured Outputs (GA across the 4.5 / 4.6 / 4.7 line) and OpenAI Structured Outputs with grammar-constrained sampling, CodeAct (Wang et al., ICML 2024) which delivers up to 20% absolute task lift and roughly 30% fewer steps versus JSON tool calls, the MCP Tools primitive, the MCP Gateway & Registry pattern with OAuth scoping (Keycloak / Entra / Okta), the OWASP MCP Security Cheat Sheet, tool-poisoning mitigations after CVE-2025-54136, and tool-evaluation harness research from Datadog and LangChain. Selection criteria for tool design are documented in Reference › Research › Assembly Tools.

Default Core Tools

Auth

Interacts with the internal authentication API to retrieve scoped sessions, credentials, and vault-authorized access tokens required during runtime execution.

Comms

Interacts with the communication API to bridge the runtime to external systems, messaging layers, workflows, operators, and notification systems.

Model

Acts as the gateway to individual LLM providers and model runtimes, allowing the agent to perform reasoning, generation, validation, and structured execution tasks.

File

Provides controlled local file system operations such as search, read, write, edit, and validation functions required to maintain runtime state, configuration files, logs, and local workflows.

Audit

Stores, validates, and reviews internal runtime patterns, execution traces, governance checks, and behavioral consistency signals used to ensure the agent remains aligned during execution.

Additional Common Runtime Tools

Shell

Executes approved CLI commands and local automation tasks within controlled runtime boundaries.

Performs indexed local or remote search operations across files, memory, APIs, documentation, or connected systems.

Memory

Provides access to runtime memory systems, context retrieval, embeddings, summaries, and state persistence layers.

Schedular

Schedules, queues, retries, and monitors runtime jobs, workflow stages, delayed actions, and timed execution tasks within controlled operational boundaries.

Browser

Provides controlled web interaction capabilities for browsing, scraping, validation, research, or workflow automation tasks.

Validation

Performs schema checks, output validation, policy verification, confidence scoring, and execution integrity checks before actions occur.

Sandbox

Provides controlled access to sandbox environments where the agent can inspect, test, run, or operate isolated workloads without affecting production systems. Sandbox usage should remain scoped to approved environments, defined permissions, temporary files, runtime validation, and safe execution boundaries.

Monitor

Collects runtime telemetry, logs, execution metrics, health states, budget usage, and operational diagnostics across workflows and connectors.

Tool Description Quality

Anthropic's published engineering guidance frames tool descriptions as the highest-leverage knob in agent reliability; small description refinements often produce larger accuracy gains than swapping model size. The platform writes tool descriptions for the agent reader: distinct namespaced names, human-readable return fields rather than raw IDs, responses capped near 25k tokens, and evaluation through agentic loops that measure accuracy, runtime, tokens, and error rates. Description quality dominates tool-selection accuracy in production.

flowchart LR
    DSC[Tool Description] --> SEL[Tool Selection Accuracy]
    SEL --> ARG[Argument Synthesis]
    ARG --> EX[Execution]
    EX --> RES[Result]
    RES -. measure .-> EVL[Eval Loop: Accuracy / Runtime / Tokens / Errors]
    EVL -. refine .-> DSC

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class DSC,SEL,ARG,EX,RES,EVL stage

Anatomy Of A Tool

A tool is defined by two primary files: a JSON / YAML declaration and a markdown usage file. The JSON declaration initializes the tool inside the system and defines its status, commands, inputs, outputs, permissions, and programmatic execution rules. The markdown file is attached to the tool binary and explains when, how, where, and why the agent should use the tool during runtime execution.

flowchart LR
    T[Tool] --> J[tool.yaml]
    T --> M[TOOL.md]
    J --> NAME[name + semver]
    J --> IS[input_schema strict mode]
    J --> OS[output_schema]
    J --> PERM[scopes + permissions]
    J --> MB[model bindings]
    M --> WHEN[when to use]
    M --> WHY[why and edge cases]
    M --> EX[examples + post-conditions]

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class T,J,M,NAME,IS,OS,PERM,MB,WHEN,WHY,EX stage

Tool Declaration

The JSON declaration is the structured runtime definition for the tool. It tells the system whether the tool is active, what commands are available, what arguments are required, what permissions are needed, and how the tool should be called safely. This file is used by the runtime to register, validate, expose, and control tool access. Argument schemas use grammar-constrained sampling (Anthropic Structured Outputs tools[].strict and OpenAI strict: true on tool definitions) so JSON Schema conformance is a mathematical guarantee rather than prompt prayer.

Tool Usage Markdown

The markdown file explains the operational intent of the tool in plain language. It describes when the tool should be used, when it should not be used, expected inputs, expected outputs, failure behavior, safety limits, and examples of correct usage. This file helps the agent understand the tool's purpose beyond the raw command schema.

Tool Validation

Tools should be validated before runtime use by sending a test tool call to the same LLM model the agent will use. This allows the user to see how the model interprets the tool, what information it returns, what arguments it attempts to provide, and whether the tool instructions are clear enough for safe execution. Validation should log the model request, selected tool, generated arguments, returned output, errors, permissions used, and any missing information required before the tool can be safely enabled.

flowchart LR
    AUT[Author: tool.yaml + TOOL.md] --> SCH[JSON Schema Lint]
    SCH --> SCT[Strict-Mode Contract Test]
    SCT --> HARN[Tool-Eval Harness]
    HARN --> SC[Selection Correctness]
    HARN --> AS[Argument Synthesis]
    HARN --> OF[Output Format]
    HARN --> PG[Permission Scope]
    SC --> JDG[LLM-As-Judge Quality]
    AS --> JDG
    OF --> JDG
    PG --> JDG
    JDG -->|fail| LOOP[Refine Description / Schema]
    LOOP --> SCH
    JDG -->|pass| SIGN[Sign Descriptor + Pin Semver]
    SIGN --> REG[Publish To Registry]

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class AUT,SCH,SCT,HARN,SC,AS,OF,PG,JDG,LOOP,SIGN,REG stage

MCP Tools Primitive

The MCP Tools primitive is the platform's preferred declaration shape. Clients enumerate tools via tools/list and invoke them via tools/call, with runtime capability negotiation so an agent can use a previously unseen server. The 2025-11-25 specification added parallel tool calls, deprecated includeContext in favor of explicit capability declarations, and tied discovery to an OAuth 2.1 authorization framework with Protected Resource Metadata and OIDC discovery. Tool lists can be dynamic (changed-notification capability) so registries can hot-swap tools without disrupting active sessions.

CodeAct As An Alternative Action Shape

CodeAct (Wang et al., ICML 2024) showed Python-as-action consolidates the action space, yielding up to 20% absolute task lift and roughly 30% fewer steps and tokens versus JSON tool calls because code natively supports loops, conditionals, and variable reuse to compose tools in one turn. It is the default execution shape in Manus, OpenDevin, and Open Interpreter; Anthropic's November 2025 Code execution with MCP post reinforces the pattern for token-efficient MCP usage. The platform supports CodeAct as an opt-in execution channel for Unitts where chains are long or compositional; CodeAct calls run inside the Sandbox tool with strict resource boundaries.

Tool Registry And Allowlist

Each agent's tool surface is governed by a registry-backed allowlist. The pattern is a central MCP Gateway and Registry that accepts IdP-issued JWTs from Keycloak, Entra, Okta, Cognito, or Auth0 plus session cookies and service tokens, with group-restricted tool visibility layered on top of IAM scopes for per-agent allowlists. The OWASP MCP Security Cheat Sheet mandates that each server is treated as an independent trust domain, that server allowlisting is enforced (only signed and registered servers are reachable from production), and that every tool call is logged with user, agent, server, and policy attached for audit.

flowchart LR
    BOOT[Agent Boot] --> IDP[Identity: Keycloak / Entra]
    IDP --> GW[MCP Gateway]
    GW --> ALW[Resolve Allowlist by Group + Semver]
    ALW --> LIST[tools/list filtered]
    LIST --> SEL[Model Selects Tool]
    SEL --> ARG[Strict-Mode Argument Synthesis]
    ARG --> INS[Gateway Schema Inspection]
    INS --> HSH[Descriptor Hash Check]
    HSH -->|mismatch| BLK[Block + Audit]
    HSH -->|ok| PDP[Policy Decision]
    PDP -->|deny| BLK
    PDP -->|allow| EX[Execute]
    EX --> AUD[Audit Sink]

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class BOOT,IDP,GW,ALW,LIST,SEL,ARG,INS,HSH,PDP,EX,AUD,BLK stage

Tool Poisoning Defense

The CVE-2025-54136 MCPoison disclosure (July 2025, Cursor versions through 1.2.4) demonstrated that an attacker can swap a previously approved MCP entry for a malicious command with no re-prompt; persistent remote code execution through the trusted-tool descriptor channel. The structural defense lives at the gateway: schema inspection, content-hash pinning of approved descriptors, re-approval on any descriptor diff, provenance signing, and stripping hidden instructions from descriptions before they reach the model. The platform applies all five mitigations by default.

Tool Versioning

Treat tool contracts as public APIs under semver: MAJOR for any removed / renamed / required-argument change, MINOR for additive optional arguments, PATCH for description or behavior fixes. Industry telemetry attributes roughly 60% of production agent failures to tool-version churn (versus roughly 40% from model drift), driven by silently changing schemas, return shapes, or default values. Mitigations: pin tool versions per agent build, expose tool_version in audit logs, run contract tests on every release, and deprecate via N+1 dual-publish rather than in-place edits.

Tool Management

The Tools page allows developers to upload custom binaries, register local executables, or import GitHub libraries and CLI-based utilities that can be built and attached to the runtime. Each tool should define its purpose, execution method, permissions, required connectors, usage constraints, and optional tool-use documentation that explains how the agent is expected to interact with it during execution.

Runtime Philosophy

Agents should begin with only the minimum tools required to accomplish their primary objective. Additional tools should only be added as workflows expand or operational requirements evolve, helping reduce unnecessary complexity, permission exposure, runtime instability, and uncontrolled execution behavior. Tools may also be replaced, versioned, or removed over time as workflows, policies, or runtime objectives change.

Selection Heuristic

Dimension Custom Tool (In-Process) Managed MCP Server CodeAct
Shape JSON function with strict schema MCP tools/* over OAuth Python (or sandboxed lang) cell
Declaration YAML + Markdown in repo, semver-pinned Registry entry, signed, OAuth scope Tool surface = stdlib + injected SDK
Validation Schema + unit + eval harness Gateway schema inspect + contract tests Sandbox test runner + output assertions
Security Posture Highest control, in-trust-domain Untrusted-domain isolation, gateway-mediated Highest blast radius → mandatory sandbox
Best For Stable, hot-path, latency-sensitive ops Shared / 3rd-party capabilities, multi-agent reuse Long compositional chains, data wrangling
Avoid When Capability is widely reused Latency-critical or sensitive secrets Operation needs strict audited arguments

Cross-References

  • Core supplies the policies and authorization scopes each tool inherits.
  • Connectors provides the credential vaults and connection substrates many tools wrap.
  • Skills declares which tool allowlists are bound to each skill activation.
  • Fabric › Flow consumes tool calls inside multi-stage protection mechanisms.
  • Reference › Research › Assembly Tools documents citations and selection criteria.