Just published by HuggingFace: The ultrascale playbook. How large model training is optimized on GPU clusters.

A lot of this is normally gospel and experience, so it's good to have everything explained in one place.

https://huggingface.co/spaces/nanotron/ultrascale-playbook

Comments