I heard people wanted a million Bluesky posts? So I made an open-source script that allows anyone to scrape the Bluesky firehose and collect everything people post.
This will be a useful resource to anyone who wants to archive this data or train generative AI. Have fun!
https://github.com/deepfates/bsky-scraper
This will be a useful resource to anyone who wants to archive this data or train generative AI. Have fun!
https://github.com/deepfates/bsky-scraper
Comments
🤡
from atproto import FirehoseSubscribeReposClient as F,parse_subscribe_repos_message as p,CAR as C
c=F()
c.start(lambda s:[print('Author DID:',m.repo,'\nPost text:',t,'\n')for m in[p(s)]if hasattr(m,'blocks')for r in C.from_bytes(m.blocks).blocks.values()if(t:=r.get('text',''))])