my take is that Reinforcement Learning Works, in that you can make current-day LLMs really good at basically any task if you can find enough problems and create a robust enough reward signal but the dream of a generalist model being Smart Enough to do these things emergently is far off - ThreadSky

aurelium.me • 37 days ago

my take is that Reinforcement Learning Works, in that you can make current-day LLMs really good at basically any task if you can find enough problems and create a robust enough reward signal

but the dream of a generalist model being Smart Enough to do these things emergently is far off

Comments

aurelium.me•37 days ago

i wish i had finished formally writing out my thoughts on this 4 months ago bc i'd look kinda prescient now - big labs are releasing increasingly task-tuned models as RL generalization proves somewhat domain-specific

i just hope my other prediction, "let a million micro-labs bloom", also comes true

reachartwork.bsky.social•37 days ago

unironically it's going to just look like megaman battle network without any of the anime stuff

aurelium.me•37 days ago

the sample efficiency of modern RL is such that a motivated person could train a classifier on their personal preferences in an AI and have their own for well under $100. if small models get good enough or fast RAM gets cheap enough this might even be practical

reachartwork.bsky.social•37 days ago

you have a little guy in your phone who knows you by name and the best way to talk to you and helps you think better and move through the world more easily. and then he delegates to like two thousand other smaller more well-trained guys invisibly

im.giovanh.com•37 days ago

I always thought it was silly that most people just had generic netnavis but I just got it

ajvh.bsky.social•37 days ago

My take on emergence is is that you can't know how far off the next breakthrough is. Could be in 10 years, could be tomorrow.

aurelium.me•37 days ago

i would be more sympathetic to this if the number of macro-scale innovations in the LLM space contributing to current capabilities was "several" rather than "like, two".

1. parameter scaling goes up to about 600B-1000B before it stops making sense
2. reinforcement learning works pretty well

reachartwork.bsky.social•37 days ago

yeah i don't really buy "anything could happen tomorrow". anything could happen tomorrow. sam altman could get hit by lightning. the Unknown Factor doesnt really factor into my predictions

aurelium.me•37 days ago

anything could happen over the course of 12-36 months, during which time it slowly becomes apparent that a new technology is important and generalizes well as increasingly ambitious experiments repeatedly come back successful and its exact strengths and weaknesses are worked out

ajvh.bsky.social•37 days ago

Sorry, it was more a joke about "emergently", as by definition that is unpredictable. And that indeed conflicts with linear thinking as we like to do. But it was also a joke because all the capabilities of LLMs so far "emerged" in reinforcement learning.

Comments

Posting Rules

Reply