Skip to yearly menu bar Skip to main content


LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

Guangtao Wang ⋅ Shubhangi Upasani ⋅ Chen Wu ⋅ Darshan Gandhi ⋅ Jonathan Li ⋅ Changran Hu ⋅ Bo Li ⋅ Urmish Thakker

Abstract

Chat is not available.