mechanicaldirk.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

mechanicaldirk.bsky.social

Training big models at @ai2.bsky.social.

49 posts 501 followers 241 following

Posts 22 Comments 28

This project is a perfect model of an OLMo contribution. Well scoped, practical, sound theoretical underpinnings, and @lambdaviking.bsky.social submitted the paper 24h before the deadline 😍. It's integrated into the OLMo trainer here: github.com/allenai/OLMo...

submitted 26 days ago • 0 comments

Finally, OLMo 1B. This is the most commonly requested OLMo feature l, and it's finally here.

submitted 59 days ago • 0 comments

I'm in Singapore for @iclr-conf.bsky.social ! Come check out our spotlight paper on the environmental impact of training OLMo (link in next tweet) during the Saturday morning poster session from 10-12:30 -- happy to chat about this or anything else! DMs should be open, email works too

submitted 67 days ago • 1 comment

Came across arxiv.org/pdf/2504.05058 today. What a cool example of work you can do when LLM training data is open!

submitted 72 days ago • 1 comment

Ever wonder how LLM developers choose their pretraining data? It’s not guesswork— all AI labs create small-scale models as experiments, but the models and their data are rarely shared. DataDecide opens up the process: 1,050 models, 30k checkpoints, 25 datasets & 10 benchmarks 🧵

submitted 75 days ago • 1 comment

Today we're unveiling OLMoTrace, a tool that enables everyone to understand the outputs of LLMs by connecting to their training data. We do this on unprecedented scale and in real time: finding matching text between model outputs and 4 trillion training tokens within seconds. ✨

submitted 81 days ago • 1 comment

The fact that my Bsky feed is all tariffs and none Llama 4 means the platform is pretty much cooked for research purposes.

submitted 83 days ago • 1 comment

We created SuperBPE🚀, a superword tokenizer that includes tokens spanning multiple words. When pretraining at 8B scale, SuperBPE models consistently outperform the BPE baseline on 30 downstream tasks (+8% MMLU), while also being 27% more efficient at inference time.🧵

submitted 100 days ago • 3 comments

Error bars! @hails.computer will be so proud!

submitted 108 days ago • 0 comments

Introducing olmOCR, our open-source tool to extract clean plain text from PDFs! Built for scale, olmOCR handles many document types with high throughput. Run it on your own GPU for free—at over 3000 token/s, equivalent to $190 per million pages, or 1/32 the cost of GPT-4o!

submitted 124 days ago • 3 comments

We took our most efficient model and made an open-source iOS app📱but why? As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime. Learn more from @soldaini.net👇 youtu.be/rEK_FZE5rqQ

submitted 138 days ago • 2 comments

14.8T tokens in 2.8M hours is about 1500 tokens per second. That's a very good number for 37B active parameters, but by no means unbelievable.

submitted 155 days ago • 0 comments

Behind the scenes with what its like to build language models and pursue (hopefully) cutting edge AI research Interviewing OLMo 2 leads: Open secrets of training language models What we have learned and are going to do next. YouTube: https://buff.ly/40IlSFF Podcast / notes:

submitted 158 days ago • 1 comment

In November, every post here was about NLP. Now it's all about TikTok. We're doing the Twitter speed run.

submitted 161 days ago • 0 comments

A few days ago, we did finally release the OLMo 2 tech report: arxiv.org/pdf/2501.00656. There is a lot of good stuff in there, but the stability work we did over the summer makes me particularly proud.

submitted 174 days ago • 0 comments

Everyone wants open-source language models but no one wants to lift these heavy ass weights. We just released our paper "2 OLMo 2 Furious" Can't stop us in 2025. Links below.

submitted 177 days ago • 6 comments

Some people seem to believe that LLMs give inoffensive, milquetoast answers because of overblown safety concerns ("Because of the woke!"). But that's not it. LLMs give bland answers because they produce the average of what anyone would have said on the Internet.

submitted 186 days ago • 1 comment

It seems to me the second most common language spoken in the halls of NeurIPS is German.

submitted 198 days ago • 0 comments

Made a list of resources for open source language models with @soldaini.net ahead of the tutorial tomorrow at 930 AM. github.com/allenai/awes...

submitted 202 days ago • 2 comments

Want to predict the task performance of LMs before pretraining them? We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.

submitted 202 days ago • 2 comments

I'll be at NeurIPS from Wednesday until Sunday! Do you think about pre-training? GPUs? What makes a foundation model good? If you have questions or answers, let's find a time to chat!

submitted 208 days ago • 0 comments

We just updated the OLMo repo at github.com/allenai/OLMo! There are now several training configs that together reproduce the training runs that lead to the final OLMo 2 models. In particular, all the training data is available, tokenized and shuffled exactly as we trained on it!

submitted 209 days ago • 0 comments