State¶
State defines how the runtime curates the working context window on every turn. Where Memory is the durable substrate and System is the active reasoning loop, the State layer is the discipline of context engineering; deciding what goes into the next model call, in what order, under what cache key, and with what compaction or summarization applied. Modern agentic runtimes succeed or fail on context engineering: the most capable model still degrades when given a noisy, unordered, or oversized working set. Emergence treats state management as a first-class runtime concern with explicit patterns drawn from the active research lineage.
State patterns inside Emergence are informed by the published agentic-context-engineering literature from 2024 through 2026, including the Anthropic Effective Context Engineering framework, Anthropic Contextual Retrieval, Anthropic prompt caching, the MemGPT virtual-context kernel, StreamingLLM attention sinks, LongRoPE position-interpolation, LangChain context engineering taxonomy (write / select / compress / isolate), and durable-state systems including LangGraph persistence. Selection criteria for picking a context strategy for a given Unitt workload are documented in Research › Context & State.
Context Window Layout¶
Every model call assembled by Emergence is structured as an ordered layout of segments. Order is load-bearing because prompt caching, attention sinks, and the lost-in-the-middle effect all depend on stable byte positions. The recommended layout places stable, governance-relevant segments at the prefix and the volatile current turn at the tail, so prefix caches remain warm across consecutive runtime steps.
flowchart LR
T[Tools Declaration] --> SYS[System Prompt]
SYS --> EX[Few-Shot Examples]
EX --> WM[Compacted History]
WM --> RT[Recent Turns]
RT --> U[Current User Turn]
U --> M[Model Call]
T -. cache breakpoint .-> CB1[Cache: tools+system+examples]
WM -. cache breakpoint .-> CB2[Cache: + compacted_history]
RT -. volatile tail .-> CB3[Uncached: recent + user]
classDef seg fill:#ffd541,stroke:#222021,color:#222021
class T,SYS,EX,WM,RT,U,M seg
The Tools declaration, system prompt, and few-shot examples form the long-lived prefix that should never be re-ordered or rewritten. Compacted history sits behind a second cache breakpoint and is only updated when the State layer commits a new compaction. Recent turns and the current user turn form the volatile tail and are not cached. This layering produces the maximum prompt-cache reuse for any given runtime step.
Context Engineering Disciplines¶
The LangChain context-engineering taxonomy provides a clean separation of the four disciplines that the State layer implements. Each discipline is independently configurable per Unitt and per workflow stage, allowing the platform to tune fidelity, latency, and cost separately.
flowchart LR
EV[Runtime Event] --> WRT[Write]
WRT --> SEL[Select]
SEL --> CMP[Compress]
CMP --> ISO[Isolate]
ISO --> M[Model Call]
classDef disc fill:#ffd541,stroke:#222021,color:#222021
class EV,WRT,SEL,CMP,ISO,M disc
Write¶
Write is the discipline of choosing what to persist beyond the current turn. The State layer decides whether a tool result, observation, or reflection becomes a turn entry, a scratchpad note, a working-memory summary, an episodic-memory write, or a discarded transient. Write policies are scoped by the governance layer so that sensitive observations never escape the working window.
Select¶
Select is the discipline of choosing which items from durable memory, scratchpad, or attached files are loaded into the next context window. The recall pipeline in Memory returns ranked candidates; the State layer applies the per-step selection budget, deduplicates against items already in the window, and orders the chosen items for cache friendliness. Just-in-time retrieval is preferred over upfront stuffing; load file paths and queries in the prefix, fetch contents only when a step needs them.
Compress¶
Compress is the discipline of reducing the byte footprint of context without losing the information the next step needs. Compression strategies include rolling summarization, hierarchical summarization, tool-output truncation, scratchpad pruning, and reversible compression. The compression policy is explicit so that information loss is observable and recoverable from the audit trail.
Isolate¶
Isolate is the discipline of keeping noisy or oversized work out of the main context window by routing it to a Subunit. The parent passes a bounded task brief; the subunit operates inside a private context window with its own tool allowlist; the parent receives only a structured summary back. Isolation is the primary mechanism for protecting the parent context from search-and-scrape, log-triage, and exploratory workloads.
Prompt Caching And Cache-Aware Ordering¶
Prompt caching is the single largest cost and latency lever available to an agentic runtime. Anthropic prefix caching caches a token prefix up to an explicit cache_control breakpoint; the cache key is the exact byte prefix, so any change earlier in the layout invalidates the cache. Emergence enforces cache-aware ordering on every model call.
flowchart LR
R1[Step N: Tools + Sys + History] -->|cache hit| M[Model]
R2[Step N+1: Tools + Sys + History + new turn] -->|cache hit on prefix| M
R3[Step N+2: Tools changed] -->|cache miss, full rebuild| M
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class R1,R2,R3,M stage
The platform applies five cache-friendliness rules to every Unitt automatically:
- Tool declarations are sorted into a stable order at boot and not rewritten mid-session.
- System prompt and identity blocks are immutable for the duration of a runtime session.
- Compacted history is only re-emitted when a new compaction commits, never re-summarized in place.
- Volatile material (timestamps, request IDs, ephemeral counters) is kept out of the cacheable prefix.
- Long-TTL blocks always precede shorter-TTL blocks within the layout.
Compaction¶
Compaction is the State layer's response to context pressure. When the working window exceeds its configured budget, the runtime replaces older turn segments with a structured summary that preserves completed work, current state, files modified, in-progress work, next steps, and outstanding constraints. Compaction can be invoked proactively (typically around 60% capacity) or automatically (around 95% capacity).
flowchart LR
H[History Buffer] --> Q{Token Budget?}
Q -->|under| P[Pass-Through]
Q -->|over| SM[Summarizer]
SM --> S[Compacted Summary Block]
S --> NX[Next Step Context]
P --> NX
SM -. archive .-> EM[Episodic Memory]
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class H,Q,P,SM,S,NX,EM stage
Every compaction event is auditable. The pre-compaction history is archived into episodic memory so that the original execution trace can be replayed or referenced. Compacted summaries are tagged with the runtime step at which they were produced, allowing the WorldSim replay engine to regenerate the exact context any prior runtime call observed.
Retrieval-Augmented Context¶
When the active workload depends on a corpus larger than the working window; documentation, prior tickets, codebases, customer history; the State layer assembles a retrieval block from the Memory recall pipeline. The recommended pattern is the hybrid two-stage funnel described in the recent Anthropic Contextual Retrieval work.
flowchart LR
Q[Step Query] --> B[BM25 Index]
Q --> D[Dense Embedding Index]
B --> RRF[Reciprocal Rank Fusion]
D --> RRF
RRF --> RR[Cross-Encoder Rerank]
RR --> CTX[Retrieval Block]
CTX --> M[Model Call]
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class Q,B,D,RRF,RR,CTX,M stage
BM25 catches exact identifiers; error codes, SKUs, function names, ticket numbers; that pure dense embeddings tend to miss. Dense retrieval catches paraphrase and semantic similarity. Reciprocal Rank Fusion combines the two ranked lists; a cross-encoder rerank refines the top N; the top K are injected into a dedicated retrieval block in the working window. When chunks lose meaning in isolation, the indexer prepends a 50-100 token chunk-specific context preamble at write time, which has been shown to materially improve top-K recall.
Long Context And Lost-In-The-Middle¶
Nominal context length overstates usable context. Even frontier models degrade on multi-needle sequential tasks at 128k+ tokens. The State layer mitigates this by placing critical information at the start or end of the working window, by anchoring the model with explicit <critical> markers around must-attend segments, and by validating long-context fidelity with NIAH-style evals before relying on raw context length for a workload.
When a workload genuinely requires extended context, Emergence supports position-interpolation runtimes such as LongRoPE and serving-layer techniques such as StreamingLLM attention sinks for real-time streaming workloads. Pattern selection for long-context strategies is documented in Research › Context & State.
Durable State And Checkpointing¶
Working-window curation is only one half of state management. The other half is durable state; the runtime checkpoint that allows a session to be paused, resumed, replayed, time-traveled, or interrupted for human approval. Emergence persists a durable state snapshot per runtime step keyed by session ID, allowing any session to be recovered after a crash, paused for an external event, or rewound to a prior step for inspection.
flowchart LR
S0[Step 0 State] --> S1[Step 1 State]
S1 --> S2[Step 2 State]
S2 --> S3[Step 3 State]
S3 --> S4[Step 4 State]
S0 -. checkpoint .-> CK[(Checkpoint Store)]
S1 -. checkpoint .-> CK
S2 -. checkpoint .-> CK
S3 -. checkpoint .-> CK
S4 -. checkpoint .-> CK
CK -. replay .-> RP[Replay Engine]
CK -. time-travel .-> TT[Time Travel]
CK -. resume .-> RS[Resume After Interrupt]
classDef stage fill:#ffd541,stroke:#222021,color:#222021
class S0,S1,S2,S3,S4,CK,RP,TT,RS stage
Durable state pairs naturally with the WorldSim replay engine: any production execution can be replayed against an updated Unitt configuration to verify that the change does not regress prior validated outcomes. For long-running workflows, Emergence pairs the checkpoint store with an external workflow engine so that tool invocation has at-least-once semantics across crashes and restarts.
Session Segmentation¶
Long-running Unitts cannot operate inside a single runtime session forever. The State layer detects three signals that drive session segmentation: token pressure (compact at the configured threshold, typically 60%), goal completion (commit summary to long-term memory, start a fresh thread), and topic shift (embedding distance between recent turns and prior turns exceeds the configured threshold). Episodic memory writes are cheaper to retrieve than turn-level logs, so the platform always commits an episode summary at every session boundary.
Selection Heuristic¶
The strategies above are independently selectable and frequently composed. The recommended starting point for a new Unitt is the layered window with prompt caching and basic compaction. Add hybrid retrieval when corpora exceed the working window. Add subunit isolation when individual workflow stages would otherwise pollute the parent context. Add MemGPT-style tiering only when multi-session continuity is a strict requirement. Add long-context runtimes only after NIAH-style validation confirms the workload benefits.
| Strategy | When To Select |
|---|---|
| Layered window + caching | Default starting point for any Unitt with repeated tool calls. |
| Compaction | Single-session workflows that exceed the working window. |
| Hybrid retrieval | Corpora larger than the working window or exact-match recall required. |
| Contextual retrieval preamble | Chunks lose meaning in isolation (legal, financial, code). |
| MemGPT-style tiering | Multi-session continuity, per-user durable identity. |
| StreamingLLM | Real-time streaming workloads (voice, log monitoring). |
| LongRoPE long-context | Workload validated to require >128k effective context. |
| Subunit isolation | Workflow stages whose raw output would blow the working budget. |
| Checkpointing | Any session > 1 minute, async waits, or human approvals. |
| Episode rollover | Multi-session Unitts serving the same user across days. |
Governance Of Context¶
The State layer is a primary surface where governance enforces what the agent is permitted to see, remember, and act on. Sensitive observations are filtered before they reach the working window. Retrieved context is policy-scoped before it is injected into the prompt. Compaction outputs are reviewed against the same sensitivity gates as memory writes. Every context-management decision is recorded in the audit trail so that the exact bytes any model call observed can be reconstructed later.
State Governance Requirements
- Every context assembly records the layout, cache keys, retrieval IDs, and compaction events used.
- Sensitive observations are filtered before they enter the working window.
- Retrieved context is policy-scoped against the current connector and credential scope.
- Compaction summaries are validated against governance policies before they are committed.
- Durable checkpoints are encrypted at rest and access-scoped to the originating Unitt.
Cross-References¶
- Memory supplies the durable substrate that the State layer draws from.
- System is the runtime pattern that consumes the curated context each turn.
- Subunits describes context isolation across delegated sub-agents.
- WorldSim uses durable state and checkpoints to drive deterministic replay.
- Research › Context & State documents the selection criteria and citations.