Matching multiple experts: on the exploitability of multi-agent imitation learning
Antoine Bergerault · Volkan Cevher · Negar Mehr
Abstract
Multi-agent imitation learning (MA-IL) aims to learn optimal policies from expert demonstrations in multi-agent interactive domains. Despite existing guarantees on the performance of the extracted policy, characterizations of its distance to a Nash equilibrium are missing for offline MA-IL. In this paper, we demonstrate impossibility and hardness results of learning low-exploitable policies in general $n$-player Markov Games. We do so by providing examples where even exact measure matching fails, and present challenges associated with the practical case of approximation errors. We then show how these challenges can be overcome using strategic dominance assumptions on the expert equilibrium, assuming BC error $\epsilon_{\text{BC}}$. Specifically, for the case of dominant strategy expert equilibria, this provides a Nash imitation gap of $\mathcal{O}\left(n\epsilon_{\text{BC}}/(1-\gamma)^2\right)$ for a discount factor $\gamma$. We generalize this result with a new notion of best-response continuity, and argue that this is implicitly encouraged by standard regularization techniques.
Successful Page Load