The best algorithm for adversarial training
Abstract
What algorithm should one choose for adversarially training a neural network? Many successful supervised learning algorithms, like Support Vector Machines, separate the points with a margin that is as large as possible. The thesis of this work is that, for adversarially robust classification, special care must be taken to carefully increase the margin of separation in the geometry imposed by the threat model (i.e. the allowed perturbation set). We highlight how previous approaches fall short in that regard due to the hidden, implicit, bias of gradient descent, and discuss how generalization bounds imply much worse generalization performance. Instead, in linear models, steepest descent with respect to the dual norm of the threat model converges to the maximum margin solution in the \textit{right geometry} and, provably, optimizes a much more favorable generalization bound. Numerical simulations on synthetic data further elucidate the effect of the algorithm on the statistical and computational properties of adversarial training.