Xet infra now backs 1000s of repos on @hf.co , which means we get to put on our researcher hats and peer into the bytes 👀 🤓
Xet clients chunk files (~64KB) and skip uploads of duplicate content, but what if those chunks are already in _another_ repo? We skip those too.
Xet clients chunk files (~64KB) and skip uploads of duplicate content, but what if those chunks are already in _another_ repo? We skip those too.
Comments
- Nodes = repositories
- Edges = shared chunks
- Edge thickness = how much they overlap
It's a byte-level map of the Hub.
The result is a beautiful visualization from Saba Noorassa and @reverius42.bsky.social that I’ve already lost way too much time to.