RD-HRL: Generating Reliable Sub-Goals for Long-Horizon Sparse-Reward Tasks
Abstract
Long-horizon sparse-reward tasks, such as goal-conditioned or robot manipulation tasks, remain challenging in offline reinforcement learning due to the credit assignment problem. Hierarchical methods have been proposed to tackle this problem by introducing sub-goal planning guided by value functions, which in principle can shorten the effective planning horizon for both high-level and low-level planners, and thereby avoiding the credit assignment problem. However, we demonstrate that the sub-goal selection mechanism is unreliable, as it relies on value functions suffering from generalization noise, which misguides value estimation and thus leads to sub-optimal sub-goals. In this work, to provide more reliable sub-goals, we novelly introduce a reliability-driven decision mechanism, and propose Reliability-Driven HRL (RD-HRL) as the solution. The reliability-driven decision mechanism provide decision-level targets from transition regions, thereby providing noise-immune decision spaces for high-level policy, ensuring the reliability of sub-goals (which are termed as action-level targets in this paper). Comprehensive experimental results demonstrate that our approach RD-HRL outperforms baseline methods across multiple benchmarks, highlighting the competitive advantages of RD-HRL. Our code is anonymously available at \url{https://anonymous.4open.science/r/RD-HRL-243D }.