New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes

Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks
Post image

Comments