Skip to yearly menu bar Skip to main content


Poster

SORRY-Bench: Systematically Evaluating Large Language Model Safety Refusal

Tinghao Xie ⋅ Xiangyu Qi ⋅ Yi Zeng ⋅ Yangsibo Huang ⋅ Udari Sehwag ⋅ Kaixuan Huang ⋅ Luxi He ⋅ Boyi Wei ⋅ Dacheng Li ⋅ Ying Sheng ⋅ Ruoxi Jia ⋅ Bo Li ⋅ Kai Li ⋅ Danqi Chen ⋅ Peter Henderson ⋅ Prateek Mittal
2025 Poster

Abstract

Video

Chat is not available.