Do you want to get the most out of your samples, but increasing the update steps just destabilizes RL training? Our #ICLR2025 spotlight πŸŽ‰ paper shows that using the values of unseen actions causes instability in continuous state-action domains and how to combat this problem with learned models!

Comments