Skip to yearly menu bar Skip to main content


Spotlight
in
Workshop: Backdoor Attacks and Defenses in Machine Learning

Rethinking the Necessity of Labels in Backdoor Removal

Zidi Xiong · Dongxian Wu · Yifei Wang · Yisen Wang


Abstract:

Since training a model from scratch always requires massive computational resources recently, it has become popular to download pre-trained backbones from third-party platforms and deploy them in various downstream tasks. While providing some convenience, it also introduces potential security risks like backdoor attacks, which lead to target misclassification for any input image with a specifically defined trigger (i.e., backdoored examples). Current backdoor defense methods always rely on clean labeled data, which indicates that safely deploying the pre-trained model in downstream tasks still demands these costly or hard-to-obtain labels. In this paper, we focus on how to purify a backdoored backbone with only unlabeled data. To evoke the backdoor patterns without labels, we propose to leverage the unsupervised contrastive loss to search for backdoors in the feature space. Surprisingly, we find that we can mimic backdoored examples with adversarial examples crafted by contrastive loss, and erase them with adversarial finetuning. Thus, we name our method as Contrastive Backdoor Defense (CBD). Against several backdoored backbones from both supervised and self-supervised learning, extensive experiments demonstrate our method, without using labels, achieves comparable or even better defense compared to the backdoor defense using labels. Thus, our method allows practitioners to safely deploy pre-trained backbones on downstream tasks without extra labeling costs.

Chat is not available.