Beautiful work, thank you for the overview!
I haven’t done RL work in half a decade — I thought delayed updates with exp moving average of target networks helped deal with these instabilities. I assume it’s not enough given your paper.
1/N
I haven’t done RL work in half a decade — I thought delayed updates with exp moving average of target networks helped deal with these instabilities. I assume it’s not enough given your paper.
1/N
Comments
Could you explain how info is propagated across layers in your approach vs densenet in some more details over the conversation?
How do your approaches scale when we need to do RL on say LLMs?
2/2 (for now) Thanks again!