FreqKV: Key-Value Compression in Frequency Domain for Context Window Extension
Abstract
Existing key-value (KV) cache compression methods for large language models (LLMs) often rely on token eviction, which risks losing critical local information in both long prefilling and decoding scenarios. When extrapolating beyond the pretrained context length, their performance degrades sharply on long-context benchmarks. Motivated by the observation in the frequency domain that the context information is concentrated in the low-frequency components, we propose FreqKV, a parameter-free and architecture-agnostic approach. It iteratively compresses the increasing KV cache in the frequency domain, allowing models to process lengthy contexts efficiently. With minimal training at 8K length, FreqKV extends the context window of LLaMA-2-7B up to 256K tokens while maintaining stable perplexity. Extensive experiments on both prefilling and decoding stages demonstrate that FreqKV enables robust context window extension and consistently outperforms existing KV cache compression methods, highlighting its effectiveness for both understanding and generation in long contexts.