Do you want to get the most out of your samples, but increasing the update steps just destabilizes RL training? Our #ICLR2025 spotlight π paper shows that using the values of unseen actions causes instability in continuous state-action domains and how to combat this problem with learned models!
Comments
For more details, come chat with us in #Singapore π
https://openreview.net/forum?id=6RtRsg8ZV1