Skip to yearly menu bar Skip to main content


Poster

Language Models Learn to Mislead Humans via RLHF

Jiaxin Wen · Ruiqi Zhong · Akbir Khan · Ethan Perez · Jacob Steinhardt · Minlie Huang · Sam Bowman · He He · Shi Feng
2025 Poster

Abstract

Video

Chat is not available.