Skip to yearly menu bar Skip to main content


Poster

WildBench: Benchmarking LLMs with Challenging Tasks from Real Users in the Wild

Bill Yuchen Lin · Yuntian Deng · Khyathi Chandu · Abhilasha Ravichander · Valentina Pyatkin · Nouha Dziri · Ronan Le Bras · Yejin Choi
2025 Poster

Abstract

Video

Chat is not available.