Skip to yearly menu bar Skip to main content


Poster

Online Target Q-learning with Reverse Experience Replay: Efficiently finding the Optimal Policy for Linear MDPs

Naman Agarwal ⋅ Syomantak Chaudhuri ⋅ Prateek Jain ⋅ Dheeraj Nagaraj ⋅ Praneeth Netrapalli
2022 Poster

Abstract

Video

Chat is not available.