Skip to content

Memory Systems

The memory architectures referenced throughout Emergence › Memory draw on the active agentic-memory research lineage from 2023 through 2026. This page catalogs the canonical systems, their key innovations, and the selection criteria the platform uses when configuring memory tiers for a given Unitt workload.

Reference Systems

MemGPT / Letta

MemGPT (Packer et al., 2023) treats the LLM context window as RAM and external stores as disk, exposing OS-style paging tools (page_in, page_out, archival_search) so the model self-manages a three-tier hierarchy: main context, recall memory, and archival memory. The key innovation is letting the model issue its own memory syscalls via function calling, enabling unbounded effective context on fixed-window LLMs. Letta is the productionized framework around MemGPT, adding persistent agent state, REST APIs, and the Letta Leaderboard for read / write / update fidelity. Ideal use case: long-running conversational agents and personal assistants needing multi-session continuity with explicit governance over what enters core context. Tradeoffs: every memory operation costs an LLM round-trip, so latency and token cost scale with memory activity.

A-MEM

A-MEM (Xu et al., 2025) applies the Zettelkasten method to agent memory. Each new observation is written as a structured note with keywords, tags, and a contextual description; the agent then proposes links to existing notes, and older linked notes are re-evaluated and updated as new ones arrive, producing an evolving self-organizing knowledge graph rather than a static vector log. Ideal use case: research and analyst agents and long-horizon tasks where conceptual relationships matter more than raw recall. Tradeoffs: write-side cost is high (multiple LLM calls per note for tagging, linking, and neighbor updates); benchmark gains depend on model strength. Source code: agiresearch/A-mem.

Mem0

Mem0 (Chhikara et al., 2025) is a production-oriented memory layer with a two-stage extract / consolidate pipeline: an LLM pass distills durable facts ("user prefers Python", "timezone CET"), then a consolidation step decides ADD / UPDATE / DELETE against existing memories. The 2026 token-efficient variant moves to single-pass ADD-only extraction with entity linking and multi-signal retrieval (semantic + BM25 + entity match fused), hitting 91.6 on LoCoMo and 93.4 on LongMemEval at roughly 7k tokens per call versus 25k+ for full-context. Ideal use case: SaaS / B2C agents needing scoped per-user memory tiers (user, session, agent) with a clean API and low per-call token cost. Tradeoffs: flat fact store is weaker on multi-hop associativity than graph-native systems; aggressive extraction can lose nuance. Source code: mem0ai/mem0.

HippoRAG / HippoRAG 2

HippoRAG (Gutiérrez et al., NeurIPS 2024) models hippocampal indexing. An LLM (neocortex) extracts OpenIE triples into a knowledge graph (hippocampal index), and Personalized PageRank propagates activation from query-anchored entities to retrieve associatively linked passages. HippoRAG 2 (ICML 2025) adds recognition-memory filtering and tighter passage-triple linking, improving multi-hop and sense-making while keeping single-step retrieval 10-30× cheaper than iterative RAG. Ideal use case: knowledge-intensive QA, technical / medical / legal assistants needing multi-hop recall over large corpora. Tradeoffs: offline indexing cost (OpenIE + graph build) is significant; less suited for highly mutable conversational state. Source code: OSU-NLP-Group/HippoRAG.

Generative Agents

Generative Agents (Park et al., 2023) introduced the now-canonical memory stream; a chronological log of natural-language observations, each timestamped with creation and last-access times. Retrieval scores combine recency (exponential decay), importance (LLM-rated 1-10 salience at write time), and relevance (cosine similarity to query embedding). Reflections are higher-order summaries the agent periodically generates when accumulated importance crosses a threshold; the prototype of "reflection-as-write." Ideal use case: simulation, NPCs, and persona-driven agents where behavioral coherence over time matters more than factual precision. Tradeoffs: linear scan and LLM-scored importance do not scale past thousands of memories without sharding; importance ratings are noisy.

CoALA Cognitive Architecture

CoALA (Sumers, Yao, Narasimhan, Griffiths) ports SOAR / ACT-R abstractions to LLM agents; explicit working memory, distinct episodic / semantic / procedural long-term stores, a structured action space split into internal (retrieve, reason, learn) and external (grounding) actions, and a decision cycle that proposes, evaluates, and selects each step. CoALA is descriptive rather than a runtime, but it has become the reference vocabulary the field uses to compare systems (MemGPT ≈ episodic + working OS; A-MEM ≈ self-modifying semantic; Voyager ≈ procedural skill library). Emergence adopts the CoALA typology as the canonical tier vocabulary in Emergence › Memory.

Memory-Augmented Retrieval Patterns

The dominant 2025-2026 production pattern is hybrid: vector ANN for fuzzy recall, knowledge graph for relational / multi-hop reasoning, rolling summaries for compression, and reflection passes for abstraction. Representative systems include Zep (temporal knowledge graph, arXiv:2501.13956), AriGraph (knowledge-graph world model, IJCAI 2025), R³Mem (reversible compression), and MemR³ (router selects retrieve / reflect / answer). Reflection-as-write; using an LLM to generate new memory entries that summarize, abstract, or contradict existing ones; is now standard; the open problem is when to trigger reflection without summarization drift.

Eviction, Forgetting, And Consolidation

Common write triggers observed across 2024-2026 systems include importance thresholds (Park), turn count or token budget (MemGPT), salience classifiers (Mem0), and graph novelty (A-MEM, HippoRAG). Compression strategies include rolling-window summarizers, reflection passes, and reversible compression (R³Mem) to mitigate drift. Decay is typically exponential recency penalty at retrieval time. Merging uses similarity-based KV merge, entity-keyed consolidation (Mem0, Zep), and LLM contradiction reconciliation (A-MEM). Selective forgetting remains an open problem; MemoryAgentBench shows most systems fail to forget on demand, which is a growing governance and privacy concern.

Selection Criteria

The platform selects a memory architecture per Unitt by reading the workload profile (scale, latency, multi-session continuity, governance, recall fidelity requirement) and matching against the table below. The same selection criteria are surfaced to the WorldSim evolutionary loop as the mutation axes for memory configuration.

System Scale (memories) Latency (read) Recall fidelity Governance Statefulness Ideal Workload
MemGPT / Letta 10⁵-10⁷ Medium (LLM tool calls) High on recent, agent-managed Strong (explicit blocks, audit) Persistent across sessions Long-lived assistants, coding agents, voice agents
A-MEM 10⁴-10⁶ Medium-high (graph + LLM) High associative, evolves Medium (notes auditable, links opaque) Persistent, self-organizing Research, analyst, knowledge-work agents
Mem0 10⁶-10⁸ Low (~7k tokens / call) High factual, weaker multi-hop Strong (per-user / session scopes, REST) Persistent, scoped tiers Multi-tenant SaaS chatbots, personalization
HippoRAG 2 10⁶-10⁹ docs Low online, high offline indexing Highest multi-hop / sense-making Medium (KG inspectable) Stateless retrieval over corpus KB QA, technical / legal / medical RAG
Generative Agents 10³-10⁵ High (linear scan + LLM scoring) Medium, behaviorally coherent Weak (free-text stream) Persistent simulation state NPCs, simulation, persona agents
Custom CoALA hybrid Configurable Configurable Configurable Strongest (designed tiers) Configurable Bespoke production stacks

Picking Heuristic

  • Start with Mem0 for personalization at scale.
  • Pick HippoRAG 2 for multi-hop knowledge retrieval over a large corpus.
  • Pick Letta / MemGPT when the agent must self-manage memory and persist identity across sessions.
  • Pick A-MEM when conceptual linkage matters more than raw recall.
  • Use Generative Agents patterns for simulation, persona agents, and NPC populations.
  • Use CoALA as the design lens to decide which tiers a Unitt actually needs before reaching for a framework.

Cross-References