Skip to yearly menu bar Skip to main content


Poster

Safety Instincts: LLMs Learn to Trust Their Internal Compass for Self-Defense

Guobin Shen · Dongcheng Zhao · Haibo Tong · Jindong Li · Feifei Zhao · Yi Zeng

Abstract

Log in and register to view live content