DOPPLER: Dual-Policy Learning for Device Assignment in Asynchronous Dataflow Graphs
Abstract
We study the problem of assigning operations in a dataflow graph to devices to minimize execution time in a work-conserving system, with emphasis on complex machine learning workloads. Prior learning-based approaches face three limitations: (1) reliance on bulk-synchronous frameworks that under-utilize devices, (2) learning a single placement policy without modeling the system dynamics, and (3) depending solely on reinforcement learning in pre-training while ignoring optimization during deployment. We propose Doppler, a three-stage framework with two policies—SEL for selecting operations and PLC for placing them on devices. Doppler consistently outperforms baselines by reducing execution time and improving sampling efficiency through faster per-episode training. Our results show that DOPPLER achieves up to 52.7% lower execution times than the best baseline.