Skip to yearly menu bar Skip to main content


Poster

SqueezeAttention: 2D Management of KV-Cache in LLM Inference via Layer-wise Optimal Budget

Zihao Wang ⋅ Bin CUI ⋅ Shaoduo Gan
2025 Poster

Abstract

Video

Chat is not available.