Skip to yearly menu bar Skip to main content


Poster

Towards Best Practices of Activation Patching in Language Models: Metrics and Methods

Fred Zhang · Neel Nanda
2024 Poster

Abstract

Video

Chat is not available.