Skip to yearly menu bar Skip to main content


Poster

The Tool Decathlon: Benchmarking Language Agents for Diverse, Realistic, and Long-Horizon Task Execution

Junlong Li · Wenshuo Zhao · Jian Zhao · Weihao Zeng · Haoze Wu · Xiaochen Wang · Rui Ge · Yuxuan Cao · Yuzhen Huang · Wei Liu · Junteng LIU · Zhaochen Su · Yiyang Guo · FAN ZHOU · Lueyang Zhang · Juan Michelini · Xingyao Wang · Xiang Yue · Shuyan Zhou · Graham Neubig · Junxian He

Abstract

Log in and register to view live content