okay my car repo for this account is 11MB
there's ~30 million users and assuming 1 million users are as active as me, 10 million are half as active as me and the other 20 million don't exist
it would take 66 terabytes to archive every car repo
there's ~30 million users and assuming 1 million users are as active as me, 10 million are half as active as me and the other 20 million don't exist
it would take 66 terabytes to archive every car repo
Reposted from
soapito
wait a second
if i have a list of every did:plc there's actually nothing stopping me from exporting a CAR from every user and getting a dataset of every bsky post, right?
if i have a list of every did:plc there's actually nothing stopping me from exporting a CAR from every user and getting a dataset of every bsky post, right?
Comments
just post data would be less than that, can multiply the post number from https://bsky.jazco.dev/stats by like 500 bytes to get an upper end estimate
i guess i have to work out how to extract only posts before actually saving anything
I basically did this with https://github.com/appview-wg-bsky/backfill-bsky/tree/main/src/backfill/main.ts over ~3 days, wouldn't be tooo hard to filter it to just post records
> On a compressed FS the whole database takes up about 270GB, without compression - almost 3 times as much.
really old readme so assume it has doubled/tripled since myb
guess i'm sending it