Memory Systems¶
The memory architectures referenced throughout Emergence › Memory draw on the active agentic-memory research lineage from 2023 through 2026. This page catalogs the canonical systems, their key innovations, and the selection criteria the platform uses when configuring memory tiers for a given Unitt workload.
Reference Systems¶
MemGPT / Letta¶
MemGPT (Packer et al., 2023) treats the LLM context window as RAM and external stores as disk, exposing OS-style paging tools (page_in, page_out, archival_search) so the model self-manages a three-tier hierarchy: main context, recall memory, and archival memory. The key innovation is letting the model issue its own memory syscalls via function calling, enabling unbounded effective context on fixed-window LLMs. Letta is the productionized framework around MemGPT, adding persistent agent state, REST APIs, and the Letta Leaderboard for read / write / update fidelity. Ideal use case: long-running conversational agents and personal assistants needing multi-session continuity with explicit governance over what enters core context. Tradeoffs: every memory operation costs an LLM round-trip, so latency and token cost scale with memory activity.
A-MEM¶
A-MEM (Xu et al., 2025) applies the Zettelkasten method to agent memory. Each new observation is written as a structured note with keywords, tags, and a contextual description; the agent then proposes links to existing notes, and older linked notes are re-evaluated and updated as new ones arrive, producing an evolving self-organizing knowledge graph rather than a static vector log. Ideal use case: research and analyst agents and long-horizon tasks where conceptual relationships matter more than raw recall. Tradeoffs: write-side cost is high (multiple LLM calls per note for tagging, linking, and neighbor updates); benchmark gains depend on model strength. Source code: agiresearch/A-mem.
Mem0¶
Mem0 (Chhikara et al., 2025) is a production-oriented memory layer with a two-stage extract / consolidate pipeline: an LLM pass distills durable facts ("user prefers Python", "timezone CET"), then a consolidation step decides ADD / UPDATE / DELETE against existing memories. The 2026 token-efficient variant moves to single-pass ADD-only extraction with entity linking and multi-signal retrieval (semantic + BM25 + entity match fused), hitting 91.6 on LoCoMo and 93.4 on LongMemEval at roughly 7k tokens per call versus 25k+ for full-context. Ideal use case: SaaS / B2C agents needing scoped per-user memory tiers (user, session, agent) with a clean API and low per-call token cost. Tradeoffs: flat fact store is weaker on multi-hop associativity than graph-native systems; aggressive extraction can lose nuance. Source code: mem0ai/mem0.
HippoRAG / HippoRAG 2¶
HippoRAG (Gutiérrez et al., NeurIPS 2024) models hippocampal indexing. An LLM (neocortex) extracts OpenIE triples into a knowledge graph (hippocampal index), and Personalized PageRank propagates activation from query-anchored entities to retrieve associatively linked passages. HippoRAG 2 (ICML 2025) adds recognition-memory filtering and tighter passage-triple linking, improving multi-hop and sense-making while keeping single-step retrieval 10-30× cheaper than iterative RAG. Ideal use case: knowledge-intensive QA, technical / medical / legal assistants needing multi-hop recall over large corpora. Tradeoffs: offline indexing cost (OpenIE + graph build) is significant; less suited for highly mutable conversational state. Source code: OSU-NLP-Group/HippoRAG.
Generative Agents¶
Generative Agents (Park et al., 2023) introduced the now-canonical memory stream; a chronological log of natural-language observations, each timestamped with creation and last-access times. Retrieval scores combine recency (exponential decay), importance (LLM-rated 1-10 salience at write time), and relevance (cosine similarity to query embedding). Reflections are higher-order summaries the agent periodically generates when accumulated importance crosses a threshold; the prototype of "reflection-as-write." Ideal use case: simulation, NPCs, and persona-driven agents where behavioral coherence over time matters more than factual precision. Tradeoffs: linear scan and LLM-scored importance do not scale past thousands of memories without sharding; importance ratings are noisy.
CoALA Cognitive Architecture¶
CoALA (Sumers, Yao, Narasimhan, Griffiths) ports SOAR / ACT-R abstractions to LLM agents; explicit working memory, distinct episodic / semantic / procedural long-term stores, a structured action space split into internal (retrieve, reason, learn) and external (grounding) actions, and a decision cycle that proposes, evaluates, and selects each step. CoALA is descriptive rather than a runtime, but it has become the reference vocabulary the field uses to compare systems (MemGPT ≈ episodic + working OS; A-MEM ≈ self-modifying semantic; Voyager ≈ procedural skill library). Emergence adopts the CoALA typology as the canonical tier vocabulary in Emergence › Memory.
Memory-Augmented Retrieval Patterns¶
The dominant 2025-2026 production pattern is hybrid: vector ANN for fuzzy recall, knowledge graph for relational / multi-hop reasoning, rolling summaries for compression, and reflection passes for abstraction. Representative systems include Zep (temporal knowledge graph, arXiv:2501.13956), AriGraph (knowledge-graph world model, IJCAI 2025), R³Mem (reversible compression), and MemR³ (router selects retrieve / reflect / answer). Reflection-as-write; using an LLM to generate new memory entries that summarize, abstract, or contradict existing ones; is now standard; the open problem is when to trigger reflection without summarization drift.
Eviction, Forgetting, And Consolidation¶
Common write triggers observed across 2024-2026 systems include importance thresholds (Park), turn count or token budget (MemGPT), salience classifiers (Mem0), and graph novelty (A-MEM, HippoRAG). Compression strategies include rolling-window summarizers, reflection passes, and reversible compression (R³Mem) to mitigate drift. Decay is typically exponential recency penalty at retrieval time. Merging uses similarity-based KV merge, entity-keyed consolidation (Mem0, Zep), and LLM contradiction reconciliation (A-MEM). Selective forgetting remains an open problem; MemoryAgentBench shows most systems fail to forget on demand, which is a growing governance and privacy concern.
Selection Criteria¶
The platform selects a memory architecture per Unitt by reading the workload profile (scale, latency, multi-session continuity, governance, recall fidelity requirement) and matching against the table below. The same selection criteria are surfaced to the WorldSim evolutionary loop as the mutation axes for memory configuration.
| System | Scale (memories) | Latency (read) | Recall fidelity | Governance | Statefulness | Ideal Workload |
|---|---|---|---|---|---|---|
| MemGPT / Letta | 10⁵-10⁷ | Medium (LLM tool calls) | High on recent, agent-managed | Strong (explicit blocks, audit) | Persistent across sessions | Long-lived assistants, coding agents, voice agents |
| A-MEM | 10⁴-10⁶ | Medium-high (graph + LLM) | High associative, evolves | Medium (notes auditable, links opaque) | Persistent, self-organizing | Research, analyst, knowledge-work agents |
| Mem0 | 10⁶-10⁸ | Low (~7k tokens / call) | High factual, weaker multi-hop | Strong (per-user / session scopes, REST) | Persistent, scoped tiers | Multi-tenant SaaS chatbots, personalization |
| HippoRAG 2 | 10⁶-10⁹ docs | Low online, high offline indexing | Highest multi-hop / sense-making | Medium (KG inspectable) | Stateless retrieval over corpus | KB QA, technical / legal / medical RAG |
| Generative Agents | 10³-10⁵ | High (linear scan + LLM scoring) | Medium, behaviorally coherent | Weak (free-text stream) | Persistent simulation state | NPCs, simulation, persona agents |
| Custom CoALA hybrid | Configurable | Configurable | Configurable | Strongest (designed tiers) | Configurable | Bespoke production stacks |
Picking Heuristic¶
- Start with Mem0 for personalization at scale.
- Pick HippoRAG 2 for multi-hop knowledge retrieval over a large corpus.
- Pick Letta / MemGPT when the agent must self-manage memory and persist identity across sessions.
- Pick A-MEM when conceptual linkage matters more than raw recall.
- Use Generative Agents patterns for simulation, persona agents, and NPC populations.
- Use CoALA as the design lens to decide which tiers a Unitt actually needs before reaching for a framework.
Cross-References¶
- Emergence › Memory; the developer-facing platform layer that consumes these systems.
- Emergence › State; context curation that consumes the recall pipeline.
- Emergence › WorldSim; empirical validation of memory configuration.