(1/8) Excited to share some new work: TESS 2! TESS 2 is an instruction-tuned diffusion LM that can perform close to AR counterparts for general QA tasks, trained by adapting from an existing pretrained AR model. 📜 Paper: arxiv.org/abs/2502.13917 🤖 Demo: huggingface.co/spaces/hamis... More below ⬇️ - ThreadSky

hamishivi.bsky.social • 70 days ago

(1/8) Excited to share some new work: TESS 2!
TESS 2 is an instruction-tuned diffusion LM that can perform close to AR counterparts for general QA tasks, trained by adapting from an existing pretrained AR model.
📜 Paper: https://arxiv.org/abs/2502.13917
🤖 Demo: https://huggingface.co/spaces/hamishivi/tess-2-demo

More below ⬇️

Comments

hamishivi.bsky.social•70 days ago

(2/8) We find that TESS 2 performs well in QA, but lags in reasoning-heavy tasks (GSM8k, BBH). However, when we train on GSM8k-specific data, we beat AR models!
It may be that instruction-tuning mixtures need to be adjusted for diffusion models (we just used Tulu 2/3 off the shelf).

hamishivi.bsky.social•70 days ago

(3/8) We train TESS 2 by (1) performing 200k steps of diffusion adaptation training, (2) instruction tuning on Tulu. We found that adapting Mistral models (v0.1/0.3) performed much better than Llama!

hamishivi.bsky.social•70 days ago

(4/8) We also further improve performance without additional training in two key ways:
(1) Using more diffusion steps
(2) Using reward guidance
Explained below 👇

hamishivi.bsky.social•70 days ago

(5/8) First, as we increase diffusion steps, we see GSM8k scores improve consistently! We also see AlpacaEval improve, and then reduce, as the model generations also get more repetitive.

hamishivi.bsky.social•70 days ago

(6/8) Second, using classifier guidance with an off-the-shelf reward model (which we call reward guidance). Increasing the weight of the RM guidance improves AlpacaEval winrate. If you set the guidance really high, you get high-reward but nonsensical generations (reward-hacking!).

hamishivi.bsky.social•70 days ago

(7/8) Please check out the paper for more! We release our code and model weights. I think there's a lot of interesting work to be done here!

📜 Paper: https://arxiv.org/abs/2502.13917
🧑‍💻 Code: https://github.com/hamishivi/tess-2
🤖 Demo: https://huggingface.co/spaces/hamishivi/tess-2-demo
🧠 Models: https://huggingface.co/collections/hamishivi/tess-2-677ea36894e38f96dfc7b590

Comments

Posting Rules

Reply