Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference

Sparse Spectral Training and Inference on Euclidean and Hyperbolic Neural Networks

Jialin Zhao · Yingtao Zhang · Xinghang Li · Huaping Liu · Carlo Vittorio Cannistraci


Abstract:

The increasing GPU memory demands of large language models call for more memory-efficient training methods. Existing approaches like LoRA struggle with low-rank constraints in pre-training, while ReLoRA suffers from saddle point issues. We propose Sparse Spectral Training (SST), a memory-efficient pre-training framework that updates all singular values, selectively updates singular vectors via multinomial sampling, and leverages singular value decomposition (SVD) for initialization and periodic reinitialization, reducing distortion compared to other low-rank methods. Across tasks including language generation, machine translation, and graph learning, SST outperforms existing memory-efficient training methods and is often comparable to full-rank training. On LLaMA-1.3B, SST reduces the perplexity gap to full-rank training by 97.4\%, demonstrating its effectiveness for scalable, memory-efficient model pre-training. Our code is available at https://anonymous.4open.science/r/sparsespectraltraining-6A2C/.

Chat is not available.