Skip to yearly menu bar Skip to main content


LLMs Know What to Drop: Self-Attention Guided KV Cache Eviction for Efficient Long-Context Inference

Guangtao Wang · Shubhangi Upasani · Chen Wu · Darshan Gandhi · Jonathan Li · Changran Hu · Bo Li · Urmish Thakker

Abstract

Chat is not available.