Skip to yearly menu bar Skip to main content


Acceleration Multiple Heads Decoding for LLM via Dynamic Tree Attention

Zhendong Zhang

Abstract

Video

Chat is not available.