mechanicaldirk.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

I think for the moment we're competing on a different axis. They do quite well on impact per GPU hour. We do well on impact per person hour.

submitted 6 days ago

comment in response to post

In ML, you can get surprisingly far without ever looking at your training data, and yet you'll always be limited. Thus, in ML, "look at the data" means, "Don't just stir the pot of linear algebra, find out what's really happening."

submitted 21 days ago

comment in response to post

Meanwhile, OLMo is now the citation for QK norm, which we definitely didn't invent? You win some, you lose some.

submitted 47 days ago

comment in response to post

After ICML, I decided all conferences should be in Vienna from now on.

submitted 67 days ago

comment in response to post

It costs $90k. The $1000 are just a down payment.

submitted 105 days ago

comment in response to post

Biggest one yet! Best one yet! Plus, some fun training stories at the bottom of the blog post (allenai.org/blog/olmo2-32B).

submitted 108 days ago

comment in response to post

When I played Civilization, I always named my religion "PDF", so I can convert cities to PDF.

submitted 111 days ago

comment in response to post

That seems like a completely normal sleep schedule for a 3 month old. Source: My kid.

submitted 130 days ago

comment in response to post

#humblebrag

submitted 139 days ago

comment in response to post

I used Thinkmate. I want to roughly pick my own specs while knowing nothing about compatibility. I don't like RGB lights everywhere. And I want the thing to be reliable. No regrets.

submitted 155 days ago

comment in response to post

You posted about AI.

submitted 155 days ago

comment in response to post

I haven't read it. But I did listen to an AI generated conversation about its contents...

submitted 156 days ago

comment in response to post

Maybe Nvidia could have given us $699M worth of GPUs and we give them Beaker?

submitted 180 days ago

comment in response to post

If there was any hope that these kids would become super intelligent within a year, the money would flow.

submitted 182 days ago

comment in response to post

Needed the right incantation to rsync something to a cloud storage provider without hashing the contents, just using file size. ChatGPT results were pure fabulation.

submitted 198 days ago

comment in response to post

Had me in the first half, ngl

submitted 199 days ago

comment in response to post

I don't like it, but Jaguar needs broad strokes to survive. This is exactly what they need to do.

submitted 209 days ago

comment in response to post

How many GPUs did you do this with? I want to put "115k tokens per second" into context.

submitted 209 days ago

comment in response to post

If we're lucky, @yishan.bsky.social will see this and explain why social media is hard.

submitted 211 days ago

comment in response to post

We actually try to filter out non-English though. What do you think, multilingual OLMo3?

submitted 214 days ago

comment in response to post

I know what some university labs work with 🙀. I think @ai2.bsky.social is GPU middle class.

submitted 215 days ago

comment in response to post

Absolutely incredible effort. Many fun war stories to come (in the write-up!), but the best part of this job is how the team comes together when it's crunch time.

submitted 215 days ago

comment in response to post

NGE

submitted 216 days ago

comment in response to post

www.youtube.com/watch?v=y8pc...

submitted 216 days ago

comment in response to post

Is Claude really that much better?

submitted 217 days ago

comment in response to post

Nicht dass das für mich in Frage käme, aber was ist denn eine "Teilzeitstelle" mit 40 Wochenstunden?

submitted 217 days ago

comment in response to post

Every time I play Factorio, I think I might as well write some code, and then I go do that.

submitted 219 days ago

comment in response to post

Challenge: I got these LLM weights from a shadowy source, but I don't know what tokenizer it's for. Can we reconstruct the tokenizer?

submitted 220 days ago