Profile avatar
mechanicaldirk.bsky.social
Training big models at @ai2.bsky.social.
49 posts 501 followers 241 following
Regular Contributor
Active Commenter
comment in response to post
I think for the moment we're competing on a different axis. They do quite well on impact per GPU hour. We do well on impact per person hour.
comment in response to post
In ML, you can get surprisingly far without ever looking at your training data, and yet you'll always be limited. Thus, in ML, "look at the data" means, "Don't just stir the pot of linear algebra, find out what's really happening."
comment in response to post
Meanwhile, OLMo is now the citation for QK norm, which we definitely didn't invent? You win some, you lose some.
comment in response to post
After ICML, I decided all conferences should be in Vienna from now on.
comment in response to post
It costs $90k. The $1000 are just a down payment.
comment in response to post
Biggest one yet! Best one yet! Plus, some fun training stories at the bottom of the blog post (allenai.org/blog/olmo2-32B).
comment in response to post
When I played Civilization, I always named my religion "PDF", so I can convert cities to PDF.
comment in response to post
That seems like a completely normal sleep schedule for a 3 month old. Source: My kid.
comment in response to post
#humblebrag
comment in response to post
I used Thinkmate. I want to roughly pick my own specs while knowing nothing about compatibility. I don't like RGB lights everywhere. And I want the thing to be reliable. No regrets.
comment in response to post
You posted about AI.
comment in response to post
I haven't read it. But I did listen to an AI generated conversation about its contents...
comment in response to post
Maybe Nvidia could have given us $699M worth of GPUs and we give them Beaker?
comment in response to post
If there was any hope that these kids would become super intelligent within a year, the money would flow.
comment in response to post
Needed the right incantation to rsync something to a cloud storage provider without hashing the contents, just using file size. ChatGPT results were pure fabulation.
comment in response to post
Had me in the first half, ngl
comment in response to post
I don't like it, but Jaguar needs broad strokes to survive. This is exactly what they need to do.
comment in response to post
How many GPUs did you do this with? I want to put "115k tokens per second" into context.
comment in response to post
If we're lucky, @yishan.bsky.social will see this and explain why social media is hard.
comment in response to post
We actually try to filter out non-English though. What do you think, multilingual OLMo3?
comment in response to post
I know what some university labs work with 🙀. I think @ai2.bsky.social is GPU middle class.
comment in response to post
Absolutely incredible effort. Many fun war stories to come (in the write-up!), but the best part of this job is how the team comes together when it's crunch time.
comment in response to post
NGE
comment in response to post
www.youtube.com/watch?v=y8pc...
comment in response to post
Is Claude really that much better?
comment in response to post
Nicht dass das für mich in Frage käme, aber was ist denn eine "Teilzeitstelle" mit 40 Wochenstunden?
comment in response to post
Every time I play Factorio, I think I might as well write some code, and then I go do that.
comment in response to post
Challenge: I got these LLM weights from a shadowy source, but I don't know what tokenizer it's for. Can we reconstruct the tokenizer?