Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Neural Network Weights as a New Data Modality

ProDiF: Protecting Domain-Invariant Features to Secure Pre-Trained Models Against Extraction

Tong Zhou · Shijin Duan · Gaowen Liu · Charles Fleming · Ramana Kompella · Shaolei Ren · Xiaolin Xu

Keywords: [ Model proection ] [ TEE ] [ Unauthorized transfer ] [ Extraction attacks ]


Abstract:

Pre-trained models are valuable intellectual property, capturing both domain-specific and domain-invariant features within their weight spaces. However, model extraction attacks threaten these assets by enabling unauthorized source-domain inference and facilitating cross-domain transfer through the exploitation of domain-invariant features. In this work, we introduce ProDiF, a novel framework that leverages targeted weight space manipulation to secure pre-trained models against extraction attacks. ProDiF quantifies the transferability of filters and perturbs the weights of critical filters in unsecured memory, while preserving the actual critical weights in a Trusted Execution Environment (TEE) for authorized users. A bi-level optimization further ensures resilience against adaptive fine-tuning attacks. Experimental results demonstrate that ProDiF reduces source-domain accuracy to near-random levels and decreases cross-domain transferability by 74.65%, providing robust protection for pre-trained models. This work offers comprehensive security for pre-trained DNN models and highlights the potential of weight space manipulation as a novel approach to model security.

Chat is not available.