Skip to yearly menu bar Skip to main content


Poster

Beyond Linear Probes: Dynamic Safety Monitoring for Language Models

James Oldfield · Philip Torr · Ioannis Patras · Adel Bibi · Fazl Barez

Abstract

Log in and register to view live content