Maybe obvious, but has anyone laid out the argument for why RL signals are fundamentally better at reinforcing myopia than longer term planning, and how that explains the human tendency? - ThreadSky

About ThreadSky

davidmanheim.alter.org.il • 27 days ago

Maybe obvious, but has anyone laid out the argument for why RL signals are fundamentally better at reinforcing myopia than longer term planning, and how that explains the human tendency?

Comments

stevebyrnes.bsky.social•27 days ago

Today’s RL algorithms are usually not great at long-term planning in complex environments, mainly because long-term planning in complex env’s is a hard problem. E.g. combinatorial explosion of possibilities. (So much the worse for today’s RL algorithms!) But I don’t think that relates to humans 1/3

stevebyrnes.bsky.social•27 days ago

Whenever you see a human ignoring short-term pleasure for long-term goals or values, you can thank human brain RL for that, just as much as you can thank human brain RL for when it happens the other way around. 2/3

stevebyrnes.bsky.social•27 days ago

Jumping right into Bulverism, there’s a common tendency to treat ego-dystonic “urges” as coming from human brain RL and ego-syntonic “desires” as coming from our ethereal souls via free will. I’ve written about why that mistake feels so intuitive, see here: https://www.lesswrong.com/posts/7tNq4hiSWW9GdKjY8/intuitive-self-models-3-the-homunculus 3/3

davidmanheim.alter.org.il•26 days ago

Thanks!

I still think this misses something - the combinatorial explosion problem affects human minds' learning process as well, and it seems to mostly be solved via memesis / cultural defaults / system 2 planning, which all very notably aren't direct learning.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply