The FineWeb team is happy to finally release "FineWeb2" 🥂🥳
FineWeb 2 extends the data driven approach to pre-training dataset design that was introduced in FineWeb 1 to now covers 1893 languages/scripts
Details: https://huggingface.co/datasets/HuggingFaceFW/fineweb-2
A detailed open-science tech report is coming soon
FineWeb 2 extends the data driven approach to pre-training dataset design that was introduced in FineWeb 1 to now covers 1893 languages/scripts
Details: https://huggingface.co/datasets/HuggingFaceFW/fineweb-2
A detailed open-science tech report is coming soon
Comments