Efficient Morphology–Control Co-Design via Stackelberg PPO under Non-Differentiable Leader–Follower Interfaces
Abstract
Morphology-control co-design concerns the coupled optimization of an agent’s body structure and control policy. A key challenge is that evaluating each candidate morphology requires extensive rollouts to re-optimize control and assess quality, leading to high computational costs and slow convergence. This challenge is compounded by the non-differentiable interaction between morphology and control---stemming from discrete design choices and rollout-based evaluation---which blocks gradient flow across the morphology-control interface and forces reliance on costly rollout-driven optimization. To address these challenges, we highlight that the co-design problem can be formulated as a novel variant of a Stackelberg Markov game, a hierarchical framework where the leader specifies the morphology and the follower adapts the control. Building on this formulation, we propose \emph{Stackelberg Proximal Policy Optimization (Stackelberg PPO)}, a policy gradient method that leverages the intrinsic coupling between leader and follower to reduce repeated control re-optimization and enable more efficient optimization under non-differentiable interfaces. Experiments across diverse co-design tasks demonstrate that Stackelberg PPO outperforms standard PPO in both stability and final performance.