Skip to yearly menu bar Skip to main content


Poster

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Wenkai Yang ⋅ Shiqi Shen ⋅ Guangyao Shen ⋅ Wei Yao ⋅ Yong Liu ⋅ Gong Zhi ⋅ Yankai Lin ⋅ Ji-Rong Wen
2025 Poster

Abstract

Video

Chat is not available.