Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation
Abstract
Dataset distillation (DD) attempts to construct a compact synthetic dataset that serves as a proxy for a large real dataset under a fixed storage budget, thereby reducing the storage burden and training costs. Prior works assume the full dataset is available upfront which is distilled at once, although real datasets are collected incrementally over time in practice. To alleviate this gap, we introduce a new problem setting, Domain Incremental Dataset Distillation, that continually distills datasets from different domains into a single synthetic dataset. The conventional DD sequentially processes arriving datasets in order, overwriting the old knowledge with new one, causing catastrophic forgetting problem. To overcome this drawback, we propose Asymmetric Synthetic Data Update strategy that adjusts the per-sample update rates for synthetic dataset while balancing the stability-plasticity trade-off. Specifically, we design a bi-level optimization method based on meta learning framework to estimate the optimal update rates, that allow each sample to focus on either stability or plasticity, thereby striking a balance between the stability and plasticity. Experimental results demonstrate that our approach effectively mitigates the catastrophic forgetting and achieves superior performance of dataset distillation across continually incoming datasets compared with existing methods.