Arbitrary-Order Block SignSGD for Memory-Efficient LLM Fine-Tuning
Abstract
We propose \textbf{ABSignSGD}, a block‑coordinate variant of sign-based descent with flexible block selection that enables memory‑ and runtime‑efficient full‑parameter fine‑tuning of large language models. We present a unified convergence analysis under mild conditions, covering both the base method and a \textit{majority‑vote} extension for distributed training. The latter improves communication efficiency by aggregating only gradient signs rather than averaging full gradients. Experiments on \textcolor{blue}{Qwen3‑8B, Llama3-8B, and Qwen3-32B}, spanning mathematical reasoning and general instruction‑following tasks, show that ABSignSGD converges faster per iteration and delivers superior downstream performance while reducing both runtime and memory usage compared to existing methods. Ablation studies further indicate that the memoryless sign-based update naturally complements block‑wise updates, explaining the method’s strong empirical performance.