hamishivi.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

Excited to be back home in Australia (Syd/Melb) for most of April! Email or DM if you want to grab a coffee :)

submitted 35 days ago • 0 comments

@vwxyzjn.bsky.social and @hamishivi.bsky.social have uploaded intermediate checkpoints for our recent RL models at Ai2. Folks should do research into how RL finetuning is impacting the weights! Models with it: OLMo 2 7B, 13B, 32B Instruct; Tulu 3, 3.1 8B; Tulu 3 405b

submitted 45 days ago • 0 comments

How well do data-selection methods work for instruction-tuning at scale? Turns out, when you look at large, varied data pools, lots of recent methods lag behind simple baselines, and a simple embedding-based method (RDS) does best! More below ⬇️ (1/8)

submitted 58 days ago • 1 comment

(1/8) Excited to share some new work: TESS 2! TESS 2 is an instruction-tuned diffusion LM that can perform close to AR counterparts for general QA tasks, trained by adapting from an existing pretrained AR model. 📜 Paper: arxiv.org/abs/2502.13917 🤖 Demo: huggingface.co/spaces/hamis... More below ⬇️

submitted 70 days ago • 1 comment

GRPO makes everything better 😌

submitted 78 days ago • 0 comments

submitted 79 days ago • 0 comments

We took our most efficient model and made an open-source iOS app📱but why? As phones get faster, more AI will happen on device. With OLMoE, researchers, developers, and users can get a feel for this future: fully private LLMs, available anytime. Learn more from @soldaini.net👇 youtu.be/rEK_FZE5rqQ

submitted 79 days ago • 2 comments

li'l holiday project from the tulu team :) Scaling up the Tulu recipe to 405B works pretty well! We mainly see this as confirmation that open-instruct scales to large-scale training -- more exciting and ambitious things to come!

submitted 91 days ago • 1 comment

Seems like a good time to share this: a poster from a class project diving a little more into Tulu 3's RLVR. Deepseek R1 release today shows that scaling this sort of approach up can be very very effective!

submitted 101 days ago • 1 comment

Excited to see Tulu 3 sits in between Llama 3.1 and 3.3 instruct on the chatbot arena leaderboard right now! Particularly happy it is top 20 for Math and Multi-turn prompts :) All the details and data on how to train a model this good are right here: arxiv.org/abs/2411.15124

submitted 113 days ago • 0 comments

We released the OLMo 2 report! Ready for some more RL curves? 😏 This time, we applied RLVR iteratively! Our initial RLVR checkpoint on the RLVR dataset mix shows a low GSM8K score, so we did another RLVR on GSM8K only and another on MATH only 😆. And it works! A thread 🧵 1/N

submitted 115 days ago • 1 comment

More OLMo! More performance! More details! We applied Tulu post-training to OLMo 2 as well, so you can get strong model performance AND see what your model was actually trained on.

submitted 118 days ago • 0 comments

UW News put out a Q&A about our recent work on Variational Preference Learning, a technique for personalizing Reinforcement Learning from Human Feedback (RLHF) washington.edu/news/2024/12...

submitted 134 days ago • 1 comment

Want to predict the task performance of LMs before pretraining them? We develop task scaling laws and model ladders, which predict the accuracy on individual tasks by OLMo 2 7B & 13B models within 2 points of absolute error. The cost is 1% of the compute used to pretrain them.

submitted 143 days ago • 2 comments

New OpenAI RL finetuning API reminds me a lot of RLVR, which we used for Tülu 3 (arxiv.org/abs/2411.15124). Using RL to train against labels is a simple idea, but very effective (>10pt gains just using GSM8k train set). It's implemented for you to use in Open-Instruct 😉: github.com/allenai/open...

submitted 146 days ago • 1 comment

OpenAI announced a new RL finetuning API. You can do this on open models w the repo we used to train Tulu 3. Expanding reinforcement learning with verifiable rewards to more domains and with better answer extraction and to more domains in our near roadmap. https://buff.ly/3V4JEIJ

submitted 146 days ago • 1 comment

Curious about all this inference-time scaling hype? Attend our NeurIPS tutorial: Beyond Decoding: Meta-Generation Algorithms for LLMs (Tue. 1:30)! We have a top-notch panelist lineup. Our website: cmu-l3.github.io/neurips2024-...

submitted 146 days ago • 1 comment

I’m on the academic job market this year! I’m completing my @uwcse.bsky.social @uwnlp.bsky.social Ph.D. (2025), focusing on overcoming LLM limitations like hallucinations, by building new LMs. My Ph.D. work focuses on Retrieval-Augmented LMs to create more reliable AI systems 🧵

submitted 148 days ago • 3 comments

We're hiring another predoctoral researcher for my team at Ai2/OLMo next year. The goal of this position is to mentor and grow future academic stars of NLP/AI over 1-2 years before grad school. This ends up being people done with BS or MS who want to continue to a PhD soon. https://buff.ly/49nuggo

submitted 149 days ago • 6 comments

Excited to be at #NeurIPS next week in 🇨🇦! Please reach out if you want to chat about LM post-training (Tülu!), data curation, or anything else :) I'll be around all week, with two papers you should go check out (see image or next tweet):

submitted 150 days ago • 2 comments

I know it doesn't know much if anything about me but this was pretty surprisingly good!

submitted 152 days ago • 1 comment

Watching RL training curves is too addictive... begging my models to yap more and get more reward 🙏

submitted 153 days ago • 0 comments

🍲

submitted 156 days ago • 1 comment

What's that? A fully open LM competitive with Gemma and Qwen? Happy to have helped a bit with this release (Tulu 3 recipe used here)! OLMo-2 13B actually beats Tulu 3 8B on these evals, making it a SOTA fully open LM!!! (on the benchmarks we looked at, see tweet for more)

submitted 156 days ago • 1 comment

open source tulu 3 model recreation! rivals the original sft and other models in its size range huggingface.co/allura-org/T...

submitted 158 days ago • 1 comment

Meet Tülu 3, a set of state-of-the-art instruct models with fully open data, eval code, and training algorithms. We invented new methods for fine-tuning language models with RL and built upon best practices to scale synthetic instruction and preference data. Demo, GitHub, paper, and models 👇

submitted 161 days ago • 3 comments