New NanoGPT training speed record: 3.28 FineWeb val loss in 4.66 minutes
Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks
Previous record: 5.03 minutes
Changelog:
- FlexAttention blocksize warmup
- hyperparameter tweaks
Comments
https://github.com/KellerJordan/modded-nanogpt/blob/master/records/112424_WindowWarmup/ba299b7e-a36a-4fd8-a268-25bb772010dd.txt