note to the gpu poor: you can still train
Reposted from Serge Belongie
A new approach to training models in memory-constrained settings, LoQT allows for the pre-training of a 13B LLM on a 24GB GPU without model parallelism, checkpointing, or offloading strategies during training

Code: github.com/sebulo/LoQT

Comments