Skip to yearly menu bar Skip to main content


Oral
in
Workshop: How Far Are We From AGI

DEFT: FLASH TREE-ATTENTION WITH IO-AWARENESS FOR EFFICIENT TREE-SEARCH-BASED LLM INFERENCE

Jinwei Yao · Kexun Zhang · Kaiqi Chen · Jiaxuan You · Zeke Wang · Binhang Yuan · Tao Lin

Abstract

Chat is not available.