ICLR Poster Re-calibrating Feature Attributions for Model Interpretation

Virtual presentation / top 25% paper

Re-calibrating Feature Attributions for Model Interpretation

Peiyu Yang · NAVEED AKHTAR · Zeyi Wen · Mubarak Shah · Ajmal Mian

Keywords: [ Deep Learning and representational learning ] [ Explainable Artifical Intelligence ] [ Feature Attribution ]

[ Abstract ]

[ Poster] [ OpenReview]

Abstract:

The ability to interpret machine learning models is critical for high-stakes applications. Due to its desirable theoretical properties, path integration is a widely used scheme for feature attribution to interpret model predictions. However, the methods implementing this scheme currently rely on absolute attribution scores to eventually provide sensible interpretations. This not only contradicts the premise that the features with larger attribution scores are more relevant to the model prediction, but also conflicts with the theoretical settings for which the desirable properties of the attributions are proven. We address this by devising a method to first compute an appropriate reference for the path integration scheme. This reference further helps in identifying valid interpolation points on a desired integration path. The reference is computed in a gradient ascending direction on the model's loss surface, while the interpolations are performed by analyzing the model gradients and variations between the reference and the input. The eventual integration is effectively performed along a non-linear path. Our scheme can be incorporated into the existing integral-based attribution methods. We also devise an effective sampling and integration procedure that enables employing our scheme with multi-reference path integration efficiently. We achieve a marked performance boost for a range of integral-based attribution methods on both local and global evaluation metrics by enhancing them with our scheme. Our extensive results also show improved sensitivity, sanity preservation and model robustness with the proposed re-calibration of the attribution techniques with our method.

Chat is not available.