note to the gpu poor: you can still train
Reposted from
Serge Belongie
A new approach to training models in memory-constrained settings, LoQT allows for the pre-training of a 13B LLM on a 24GB GPU without model parallelism, checkpointing, or offloading strategies during training
Code: github.com/sebulo/LoQT
Code: github.com/sebulo/LoQT
Comments