A Tale of Two Smoothness Notions: Adaptive Optimizers and Non-Euclidean Descent
Abstract
Adaptive optimizers can reduce to normalized steepest descent (NSD) when only adapting to the current gradient, suggesting a close connection between the two algorithmic families. A key distinction in their analyses, however, lies in the smoothness assumptions they rely on. In the convex setting, adaptive optimizers are governed by a stronger adaptive smoothness condition, while NSD relies on the standard notion of smoothness. We extend the theory of adaptive smoothness to the nonconvex setting and show that it precisely characterizes the convergence of adaptive optimizers. Moreover, we establish that adaptive smoothness enables acceleration of adaptive optimizers with Nesterov momentum, a guarantee unattainable under standard smoothness. We further develop an analogous comparison for stochastic optimization by introducing adaptive variance, which parallels adaptive smoothness and leads to qualitatively stronger guarantees than the standard variance.