Maybe obvious, but has anyone laid out the argument for why RL signals are fundamentally better at reinforcing myopia than longer term planning, and how that explains the human tendency?
Comments
Log in with your Bluesky account to leave a comment
Today’s RL algorithms are usually not great at long-term planning in complex environments, mainly because long-term planning in complex env’s is a hard problem. E.g. combinatorial explosion of possibilities. (So much the worse for today’s RL algorithms!) But I don’t think that relates to humans 1/3
Whenever you see a human ignoring short-term pleasure for long-term goals or values, you can thank human brain RL for that, just as much as you can thank human brain RL for when it happens the other way around. 2/3
I still think this misses something - the combinatorial explosion problem affects human minds' learning process as well, and it seems to mostly be solved via memesis / cultural defaults / system 2 planning, which all very notably aren't direct learning.
Comments
I still think this misses something - the combinatorial explosion problem affects human minds' learning process as well, and it seems to mostly be solved via memesis / cultural defaults / system 2 planning, which all very notably aren't direct learning.