Skip to content

Skills

Skills define the specialized operational knowledge and execution behaviors that an agent can apply during runtime execution. Unlike tools, which provide executable capabilities, skills provide focused guidance that teaches the agent when, how, and why specific workflows, reasoning patterns, validation methods, or operational behaviors should be used. Skills act as reusable runtime knowledge modules that help maintain consistent execution quality across objectives, workflows, and runtime sessions.

Skills are informed by the active agentic skill-packaging and procedural-memory research lineage, including Anthropic Agent Skills (declared an open standard December 18, 2025 and adopted by VS Code, GitHub, Cursor, Goose, Amp, OpenCode), the SKILL.md specification, the CoALA cognitive architecture (Sumers et al. 2023) procedural-memory tier, Voyager (Wang et al. 2023) skill libraries indexed by embedding, langgraph-bigtool semantic skill retrieval, the MCP Prompts primitive, the Anthropic skill-creator v2 evaluation framework (March 2026) with Create / Eval / Improve / Benchmark modes, the OWASP Agentic Skills Top 10, and recent supply-chain research including Snyk ToxicSkills (36.82% of ClawHub skills had at least one security flaw). Selection criteria for skill packaging are documented in Reference › Research › Assembly Skills.

Skill Registration

Skills are registered as focused markdown files (SKILL.md) that describe a single operational capability or behavioral specialization. Each skill should remain narrowly scoped and clearly define what the skill does, when it should be activated, how it should be applied during execution, what dependencies or tools it may require, and what outcomes or validation conditions are expected. The goal is to keep skills explicit, modular, composable, and optimized for runtime retrieval.

The platform adopts the open Anthropic Agent Skills format directly: each skill is a folder containing a SKILL.md plus optional scripts, templates, and assets. The SKILL.md opens with YAML frontmatter where name (≤64 characters, must match the folder) and description (≤1024 characters, governs activation) are required, and optional fields include allowed-tools, license, and when_to_use. The same SKILL.md works unmodified across the platform and across the wider ecosystem (Claude Code, OpenAI Codex CLI, Gemini CLI, GitHub Copilot, Cursor, VS Code, Goose, Amp, OpenCode).

flowchart LR
    DIR[Skill Folder] --> SM[SKILL.md]
    DIR --> AST[Assets]
    DIR --> SCR[Scripts]
    SM --> FM[YAML Frontmatter]
    FM --> N[name]
    FM --> D[description]
    FM --> AT[allowed-tools]
    FM --> WTU[when_to_use]
    SM --> BODY[Markdown Body]
    BODY --> WORK[Workflow Steps]
    BODY --> EX[Examples]
    BODY --> POL[Policy Text]

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class DIR,SM,AST,SCR,FM,N,D,AT,WTU,BODY,WORK,EX,POL stage

Runtime Skill Usage

The runtime continuously evaluates objectives, workflows, context state, connectors, tools, and execution patterns to determine which skills should be loaded or activated during execution. Skills are designed to prevent capability drift and reduce the likelihood that important operational behaviors are forgotten during long-running or multi-stage workflows. By maintaining focused, isolated skill definitions, the platform can more reliably ensure that LLM calls utilize the correct operational knowledge at the correct stage of execution.

Activation follows the progressive disclosure pattern. At startup the agent pre-loads only every skill's name plus description (roughly 5k tokens for 50 skills) into the system prompt, and only when the model judges relevance does it read the full SKILL.md body into context; body assets are loaded on demand. For libraries too large for even metadata to fit, langgraph-bigtool runs semantic search over skill descriptions in an in-memory store and injects only the top-k matches at runtime, enabling hundreds-to-thousands of capabilities without context blow-up. The description string is therefore the load-bearing retrieval key.

flowchart LR
    BOOT[Agent Boot] --> META[Index Skill Metadata]
    META --> SYS[Inject name + description Into System Prompt]
    SYS --> TURN[User Turn]
    TURN --> SEL{Library Size?}
    SEL -->|small N| MOD[Model Selects Skill]
    SEL -->|large N| BIG[langgraph-bigtool Semantic Top-K]
    BIG --> MOD
    MOD --> LOAD[Load Full SKILL.md + Assets]
    LOAD --> EX[Execute Workflow Under allowed-tools]
    EX --> TEL[Telemetry → Eval Store]

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class BOOT,META,SYS,TURN,SEL,MOD,BIG,LOAD,EX,TEL stage

Skills As Procedural Memory

CoALA divides long-term memory into semantic, episodic, and procedural tiers; the procedural tier "encodes skills and procedures, often represented as code snippets, tool definitions, or implicitly within LLM parameters." The platform binds the Assembly Skills surface to the Emergence › Memory procedural tier so Skills are the write-once-via-promotion, embedding-indexed, code-verified procedural-memory substrate of the runtime. Voyager is the canonical implementation: each verified skill is indexed by the embedding of its natural-language description; on a new task the top-5 are retrieved by cosine similarity and injected into the prompt; execution stays deterministic code while retrieval stays fuzzy.

Complex Skills

Complex skills define multi-stage operational capabilities that combine workflows, tools, connectors, objectives, validation logic, execution patterns, and runtime behaviors into a reusable execution system. Unlike simple skills that provide focused behavioral guidance, complex skills coordinate multiple runtime components together to accomplish advanced operational tasks such as evaluations, deployments, security analysis, orchestration, infrastructure automation, research pipelines, or large-scale data processing workflows.

Complex Skill Structure

A complex skill is typically composed of multiple focused markdown skill files, workflow graphs, tool dependencies, validation stages, execution policies, and optional runtime constraints. These components work together as a modular execution package that can be activated by the runtime when specific objectives, workflows, environmental conditions, or execution states are detected. The goal is to break advanced operational behavior into structured, reusable runtime systems rather than relying on a single monolithic prompt.

flowchart LR
    CS[Complex Skill] --> SK1[Sub-Skill A]
    CS --> SK2[Sub-Skill B]
    CS --> SK3[Sub-Skill C]
    CS --> WF[Workflow Graph]
    CS --> TD[Tool Dependencies]
    CS --> POL[Policy Bundle]
    CS --> VAL[Validation Stages]

    WF -. orchestrates .-> SK1
    WF -. orchestrates .-> SK2
    WF -. orchestrates .-> SK3
    TD -. allowlist .-> SK1
    TD -. allowlist .-> SK2

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class CS,SK1,SK2,SK3,WF,TD,POL,VAL stage

Runtime Activation

Complex skills are activated when the runtime determines that a workflow requires a specialized multi-stage operational process. During execution, the runtime may load supporting tools, connectors, policies, validation systems, memory context, and sub-skills associated with the complex skill while coordinating execution through the workflow engine. Activation is deny-by-default for tools: a sub-agent inherits the skill's tool allowlist only, applying least privilege as immutable config the model cannot widen.

Example Use Cases

Examples of complex skills may include:

  • MLOps evaluation pipelines
  • Infrastructure deployment systems
  • Security auditing workflows
  • Competitive market analysis
  • Multi-stage research pipelines
  • Automated testing and validation
  • CI / CD orchestration
  • Incident response coordination
  • Multi-agent workflow management

Skill Validation

Anthropic's skill-creator v2 (released March 2026) provides Create / Eval / Improve / Benchmark modes with four agents (Executor, Grader, Comparator, Analyzer); the platform integrates this evaluation loop into its skill registration pipeline. The framework defines two regression classes: regression (model plus skill performs worse after a model upgrade) and outgrowth (the base model now passes the evaluations without the skill; recommend retire). CI runs the eval set on every published version.

flowchart LR
    PR[Skill PR] --> SCAN[Static Scan: skillfortify]
    SCAN --> SIGN[Sign + Trusted Publisher Key]
    SIGN --> EXE[Executor Runs Eval Prompts]
    EXE --> GRD[Grader Checks Assertions]
    GRD --> CMP[Comparator: Blind A/B vs Prior]
    CMP --> ANL[Analyzer: Propose Patches]
    ANL --> TTN[Trigger Tuner: Description 60/40]
    TTN --> RG[Regression Gate]
    RG --> OG[Outgrowth Gate]
    OG --> PUB[Publish + Semver Bump]
    PUB --> RPL[Replay Harness: Detect Skill Conflict]

    classDef stage fill:#ffd541,stroke:#222021,color:#222021
    class PR,SCAN,SIGN,EXE,GRD,CMP,ANL,TTN,RG,OG,PUB,RPL stage

Skill Safety

The platform treats skill safety with the same posture as tool safety. The Snyk ToxicSkills study (February 2026) found 36.82% of ClawHub-published skills had at least one security flaw and 13.4% had critical issues; malware delivery, prompt injection, and hard-coded secrets were the dominant categories. Skill poisoning is the analog of tool poisoning but evades CVE and SBOM tooling because the payload is in instructions rather than code dependencies. Mitigations: trusted-publisher signing (the signed-agent-card analog), reject-unsigned at runtime, static scan of the SKILL.md plus all assets (tools such as skillfortify produce an Agent-SBOM), and trust-score propagation through skill-to-skill dependency edges.

Skill Composition

Composition is supported by reference: a SKILL.md may instruct the agent to invoke another skill, producing a skill graph traversed at runtime via the same description-retrieval mechanism. Anthropic's skill-creator itself is a meta-skill, a skill that authors other skills. Best-practice taxonomy across published skill patterns: discovery / selection, context economy, instruction calibration, workflow control, executable code. Composition lives in the workflow-control category.

Skills Versus Tools Versus MCP

The platform's contract is: Skills teach how, Tools execute what, MCP servers expose capabilities. MCP Prompts are reusable templates a server offers over JSON-RPC; Skills are static folders the host loads into the model's context that teach methodology rather than expose capability. MCP gives an agent the ability to act; Skills tell it how. The contract holds: if it teaches how, ship a Skill; if it executes what, ship a Tool or MCP server; if both, ship a complex Skill that declares its required tools.

Platform Philosophy

The Unitt platform is designed to treat complex skills as composable operational systems rather than isolated prompts. By separating workflows, skills, tools, policies, and validation logic into structured runtime components, the platform can optimize retrieval, reduce prompt bloat, improve execution consistency, and ensure that advanced operational capabilities remain reusable, governable, and continuously evolvable over time.

Selection Heuristic

Need Packaging Shape Activation Validation
Single deterministic capability MCP Tool Always-on, listed in system prompt Tool unit test, contract test
External system access MCP Server (Tools + Resources + Prompts) Always-on connector Integration test against sandbox
Reusable methodology / convention Simple Skill (SKILL.md only) Description-triggered, progressive disclosure skill-creator Eval, trigger-tuning
Multi-stage workflow + scripts + policy Complex Skill (SKILL.md + scripts + allowed-tools) Description-triggered, then load assets Eval + A/B + outgrowth check
100s-1000s of capabilities Skill library + bigtool Semantic top-k retrieval Retrieval accuracy + per-skill eval
Skill that builds skills Meta-skill Invoked by author / user Output skill passes own evals

Cross-References