Poster
in
Workshop: Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks
Jialin Zhao · Yingtao Zhang · Xinghang Li · Huaping Liu · Carlo Vittorio Cannistraci
The increasing GPU memory demands of large language models call for more memory-efficient training methods. Existing approaches like LoRA struggle with low-rank constraints in pre-training, while ReLoRA suffers from saddle point issues. We propose Sparse Spectral Training (SST), a memory-efficient pre-training framework that updates all singular values, selectively updates singular vectors via multinomial sampling, and leverages singular value decomposition (SVD) for initialization and periodic reinitialization, reducing distortion compared to other low-rank methods. Across tasks including language generation, machine translation, and graph learning, SST outperforms existing memory-efficient training methods and is often comparable to full-rank training. On LLaMA-1.3B, SST reduces the perplexity gap to full-rank training by 97.4\%, demonstrating its effectiveness for scalable, memory-efficient model pre-training. Our code is available at https://anonymous.4open.science/r/sparsespectraltraining-6A2C/.