It's fine to ignore these tools, but I actually think trying them out in the context of something you known well is worth doing. The mistakes these things make will get harder to see with the more coats of paint OpenAI and others slap on them.
Comments
Log in with your Bluesky account to leave a comment
So along with all the energy and water they use, we now have to waste our time teaching them. Just cut out the middle-man and go and ask an expert in whatever you need to know. At least (you'd hope) they won't lie or give you a bullshit answer they just made up.
I mean, by all means, don't use them.
The energy required to run these is an issue, but the models will get smaller through quantization and architectural refinements. I also don't think everyone can know or ask an expert. But even if they should, people are using these and will continue to do so.
Architectural refinements might make the resource-per-weight smaller, but AFAIK the expectation is still that qualitative leaps require multiplying the number of parameters which AFAIK necessarily multiplies the resource costs to both query and train
It’ll happen on multiple fronts. Quantization is already very helpful, and hardware for lowering the power efficiency of running neural networks will advance a lot. It’s why Same Altman is out there trying to raise a trillion $. Think about how Bitcoin mining on GPUs moved to FPGAs.
And you’ll probably have a different models within the MoE running in different places (locally for some, in data centers for others)
Because the energy investment is a lot about the hardware it’s running on. Although I’m pretty sure models will get smaller in terms of the number of weights in time
Bitcoin is the archetypal O(1/n) problem so perhaps not the best comparison.
While I haven't kept up w/ the research, AIUI the fundamental bottlenecks are that a query is at least linear in the weights and unavoidably *super*linear in context size. Both of which are fundamental to the LLM's utility
(sorry, not trying to harp on you personally at all, just had lots of of remarks)
Verifiable correctness is an insanely strong property in general, even for normal code. I can't conceive how you'd get anywhere close with LLMs, given they're black boxes
No worries. And agreed. There is a whole field related to it and I don’t intend to do more than use some of its primitives to try and see if I can improve the current state of LLMs and not to formally verify LLMs themselves.
Agreed. I think they are making the determination that no matter what happens, they will be out-competed if they don't build them now. This, to me, has less to do with the models and where things may go than markets and first-mover advantage.
It's why it would be nice to have things like methane generators not allowed to power data centers or requirements on new construction relying more on renewables with each year, but that's not the current regulatory climate, to say the least
Comments
The energy required to run these is an issue, but the models will get smaller through quantization and architectural refinements. I also don't think everyone can know or ask an expert. But even if they should, people are using these and will continue to do so.
Because the energy investment is a lot about the hardware it’s running on. Although I’m pretty sure models will get smaller in terms of the number of weights in time
While I haven't kept up w/ the research, AIUI the fundamental bottlenecks are that a query is at least linear in the weights and unavoidably *super*linear in context size. Both of which are fundamental to the LLM's utility
For me, the only broader implication of LLM use I find interesting is a harm reduction strategy. And the open-sourcing of models (weights and all)
Verifiable correctness is an insanely strong property in general, even for normal code. I can't conceive how you'd get anywhere close with LLMs, given they're black boxes