Oral
in
Workshop: Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions

Debugging Concept Bottlenecks through Intervention: Shortcut Removal + Retraining

Eric Enouen · sainyam galhotra

Keywords: prototypes robustness Keywords: concept bottleneck shortcut learning interpretability

Project Page [ OpenReview]

Abstract

Machine learning models often learn unintended shortcuts (spurious correlations) that do not reflect the true causal structure of a task and thus degrade dramatically under subpopulation shift. This problem becomes especially severe in high-stakes domains where the cost of relying on misaligned shortcuts is prohibitive. To address this challenge, concept bottlenecks explicitly factor predictions into high-level concepts and a simple decision layer, enabling experts to diagnose whether learned concepts align with their domain knowledge. Yet, simply removing undesirable concepts after training is insufficient to prevent shortcuts when the concept encoder is incomplete or entangled. In this work, we propose CBDebug, a novel framework to debug concept bottlenecks for robustness under subpopulation shift. First, a domain expert identifies and removes spurious concepts using model explanations (the Removal step). Then, leveraging this human feedback, we disentangle or replace the removed shortcuts by retraining on a rebalanced dataset based on the causal graph (the Retraining step). Empirically, CBDebug significantly outperforms existing concept-based methods. Overall, our work demonstrates how expert-guided debugging of concept bottlenecks can achieve interpretability and robustness, promoting alignment of a model’s internal reasoning with how humans reason.

Chat is not available.