Ooh new fineweb dataset just dropped: Fineweb 2 - 3T tokens of highly multilingual top-quality filtered data, permissively licensed!

https://huggingface.co/datasets/HuggingFaceFW/fineweb-2

Apologies to the GPU-Poors (like me!) who can only imagine what one could build with it, if only I had 10^25 FLOPs lying around)

Comments