Ever wanted to train your own 13B Llama2 model from scratch on a 24GB GPU? Or fine-tune one without compromising performance compared to full training? 🦙 You now can, with LoQT: Low-Rank Adapters for Quantized Pretaining! arxiv.org/abs/2405.16528 1/4 - ThreadSky

About ThreadSky

sloeschcke.bsky.social • 106 days ago

Ever wanted to train your own 13B Llama2 model from scratch on a 24GB GPU? Or fine-tune one without compromising performance compared to full training? 🦙
You now can, with LoQT: Low-Rank Adapters for Quantized Pretaining! https://arxiv.org/abs/2405.16528
1/4

Comments

yourmoonliness.bsky.social•106 days ago

i like this tweet because i understood the following words: from, model, one, you now can, for. the rest was all in alien speak! 🙂

sloeschcke.bsky.social•106 days ago

LoQT initializes low-rank adapters using the gradients of a base model. We then train a single adapter per layer, keeping the others frozen❄️ and quantized📉.
This reduces memory for gradients, optimizer states, and weights—even when pretraining from scratch.
2/4

sloeschcke.bsky.social•106 days ago

We periodically merge the low-rank adapters into the quantized model over exponentially increasing intervals. After each merge, we reinitialize the adapters and continue training.
We show LoQT works for both LLM pre-training and downstream task adaptation📊.
3/4

sloeschcke.bsky.social•106 days ago

Excited to see LoQT adopted in memory-constrained settings, lowering the barrier to training models such as LLMs!
Great collaboration with @mabeto5p, @mjkastoryano, @sergebelongie.bsky.social , @vesteinns.bsky.social

Code: https://github.com/sebulo/LoQT 💻
Paper: https://arxiv.org/abs/2405.16528 📄

sloeschcke.bsky.social•106 days ago

LoQT will be presented at NeurIPS 2024! 🎉

This research was funded by @DataScienceDK, and @AiCentreDK and is a collaboration between @DIKU_Institut, @ITUkbh, and @csaudk

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply