Poster
in
Workshop: Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference
LoRAM: Low-Rank Adaptation of Large Language Models on Manifold
Xiaowen Jiang · Xun Wang · Sebastian Stich
Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning (PEFT) method, has gained remarkable popularity in recent years. By freezing pretrained weights and injecting the product of two trainable low-rank matrices into certain layers of the model, LoRA significantly reduces the number of trainable parameters while introducing no additional inference latency. From an optimization perspective, the original domain consists of bounded-rank matrices, which LoRA parametrizes using the standard LR factorization. However, this parametrization has unfavorable theoretical properties, including a highly non-smooth optimization landscape and the absence of fast local convergence guarantees. In this work, we explore two alternative techniques with stronger theoretical properties for fine-tuning large models: (i) direct optimization over the set of fixed-rank matrices and (ii) optimization over bounded-rank matrices using a smooth parameterization via desingularization. Both approaches leverage well-established Riemannian manifold geometry, and we employ Riemannian Adam with coordinate-wise stepsize as the optimization algorithm. The resulting methods have comparable memory and computation complexity to LoRA optimized with Adam. We show superior performances of them on fine-tuning LLaMA for commonsense reasoning tasks.