Skip to yearly menu bar Skip to main content


Neural Chameleons: Language Models Can Learn to Hide Their Thoughts from Unseen Activation Monitors

Max McGuinness ⋅ Alex Serrano ⋅ Luke Bailey ⋅ Scott Emmons

Abstract

Chat is not available.