Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Workshop on Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference

LoRAM: Low-Rank Adaptation of Large Language Models on Manifold

Xiaowen Jiang · Xun Wang · Sebastian Stich


Abstract:

Low-Rank Adaptation (LoRA), a parameter-efficient fine-tuning (PEFT) method, has gained remarkable popularity in recent years. By freezing pretrained weights and injecting the product of two trainable low-rank matrices into certain layers of the model, LoRA significantly reduces the number of trainable parameters while introducing no additional inference latency. From an optimization perspective, the original domain consists of bounded-rank matrices, which LoRA parametrizes using the standard LR factorization. However, this parametrization has unfavorable theoretical properties, including a highly non-smooth optimization landscape and the absence of fast local convergence guarantees. In this work, we explore two alternative techniques with stronger theoretical properties for fine-tuning large models: (i) direct optimization over the set of fixed-rank matrices and (ii) optimization over bounded-rank matrices using a smooth parameterization via desingularization. Both approaches leverage well-established Riemannian manifold geometry, and we employ Riemannian Adam with coordinate-wise stepsize as the optimization algorithm. The resulting methods have comparable memory and computation complexity to LoRA optimized with Adam. We show superior performances of them on fine-tuning LLaMA for commonsense reasoning tasks.

Chat is not available.