Sculpting Subspaces: Constrained Full Fine-Tuning in LLMs for Continual Learning
Abstract
Continual learning in large language models (LLMs) is prone to catastrophic forgetting, where adapting to new tasks significantly degrades performance on previously learned ones. Existing parameter-efficient methods often limit model expressivity or introduce new parameters per task, creating scalability issues. To address these limitations, we introduce Orthogonal Subspace Fine-Tuning (OSFT), a novel parameter-efficient approach for continual learning. OSFT leverages adaptive singular value decomposition (SVD) to dynamically identify and preserve critical, high-rank parameter subspaces that encode prior knowledge. All updates for new tasks are constrained to be strictly orthogonal to these preserved subspaces, which minimizes interference while maintaining a fixed parameter count and avoiding the need to store task-specific gradients. We extensively evaluate OSFT on standard continual learning benchmarks using both encoder-decoder (T5-Large) and decoder-only (LLaMA-2 7B, Mistral-7B) models across diverse tasks. Empirically, our method achieves a state-of-the-art trade-off between learnability and knowledge retention, dominating the Pareto frontier, with up to 7\% higher average accuracy than recent baselines like O-LoRA, and reduces forgetting to near-negligible levels. It notably maintains the model's general linguistic capabilities, instruction-following, and safety throughout the learning process. OSFT provides a practical, theoretically grounded, and scalable solution that effectively balances model plasticity and knowledge retention for continual learning in LLMs.