Beautiful work, thank you for the overview!

I haven’t done RL work in half a decade — I thought delayed updates with exp moving average of target networks helped deal with these instabilities. I assume it’s not enough given your paper.
1/N

Comments