Poster
in
Workshop: New Frontiers in Associative Memories
Hierarchical Episodic Memory in LLMs via Multi-Scale Event Organization
Martin A Benfeghoul · Haitham Bou Ammar · Jun Wang · Zafeirios Fountas
A major limitation of contemporary large language models (LLMs) is their significant performance degradation when processing long contexts, primarily due to self-attention dilution and context window limitations. Recent work on retrieval-augmented LLMs has shown that integrating human-inspired episodic memory formation and retrieval into Transformers, an architecture termed EM-LLM, enables pre-trained models to process up to 10M tokens while consistently outperforming their full-context versions using only a fraction of the computational resources. A crucial feature of EM-LLM is the segmentation of the model's KV-cache into human-like events based on token-level surprise. However, this approach overlooks the hierarchical nature of human episodic memory, which exhibits nested timescale organization across multiple levels of abstraction. Here, we introduce two novel head-level event segmentation methods that leverage the inherent hierarchical processing in Transformer layers, combining similarity-based boundary detection with coordinated event hierarchies. Our experiments suggest that these structures are not only likely to improve retrieval performance but also mirror the nested event hierarchies observed in human cognition, providing both practical advances in LLM capabilities and insights into memory organization across artificial and biological systems.