Skip to yearly menu bar Skip to main content


Poster

DynamicInfer: Runtime-Aware Sparse Offloading for LLMs Inference on a Consumer-Grade GPU

Zhui Zhu · Weichen Zhang · Zhenghan Zhou · Yunhao Liu · Fan Dang

Abstract

Log in and register to view live content