Pi-CCA: Prompt-Invariant CCA Certificates for Replay-Free Vision–Language Continual Learning
Abstract
When deployed on non-stationary data streams, foundation vision-language models require continual updates without access to past data. However, naive fine-tuning undermines their zero-shot recognition capabilities and prompt robustness. We seek a replay-free principle that preserves pre-trained cross-modal generalization under domain/prompt shifts. We introduce Prompt-Invariant CCA Certificates (Pi-CCA), a geometry-first approach that summarizes image--text alignment with a compact certificate capturing the top-k canonical spectrum and subspace. During adaptation, we match this summary using only mini-batch statistics and induce prompt robustness via averaging over perturbations. Across MTIL, X-TAIL, VLCL, and ConStruct-VL, Pi-CCA achieves state-of-the-art performance among replay-free methods. By optimizing alignment invariants rather than proxy signals, Pi-CCA provides a simple, generator-free, constant-memory path to continual adaptation with strong zero-shot retention and resilience to prompt/style shifts.