Poster
in
Workshop: Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
Exploring the dual lottery ticket hypothesis in finetuning through specialised sparsification
Sampreeth R S · Arindam Biswas · Pabitra Mitra · Biswajit Basu
Adapting foundation models to new tasks often involves modifying all model weights, leading to destructive interference such as catastrophic forgetting and degraded multi-task performance. Sparse adaptation methods like Lottery Ticket Adaptation (LoTA) mitigate these issues by optimizing only sparse subnetworks, achieving better results and enabling model merging across dissimilar tasks. Concurrently, the Dual Lottery Ticket Hypothesis (DLTH) states that randomly selected subnetworks can be transformed to a trainable condition that matches the performance of winning tickets. In this work, our goal is to explore the DLTH in sparse transformer finetuning tasks. We introduce a novel approach that employs expander graph masks to obtain an initial sparse subnetwork instead of random selection. In the first stage by maintaining a high spectral gap through expander masks and later by applying Random Sparse Network Transformation (RST), we transform randomly selected subnetworks into trainable ones. This method not only improves accuracy over random pruning but also uses the same mask across all layers, simplifying the adaptation process. This approach demonstrates expander-based initial pruning enhances sparse adaptations in foundation models, with the potential of addressing multi-task learning challenges without destructive interference.