The Wikimedia Foundation, which owns Wikipedia, says its bandwidth costs have gone up 50% since Jan 2024 — a rise they attribute to AI crawlers. AI companies are killing the open web by stealing visitors from the sources of information and making them pay for the privilege - ThreadSky

cameronwilson.bsky.social • 64 days ago

The Wikimedia Foundation, which owns Wikipedia, says its bandwidth costs have gone up 50% since Jan 2024 — a rise they attribute to AI crawlers.

AI companies are killing the open web by stealing visitors from the sources of information and making them pay for the privilege

Comments

maddogeco.bsky.social•63 days ago

I had to shut down my little hobby site reviewing apple cider. I provided pairing recommendations and supported Aussie cider makers. But ai bots crawled the site and gave people my recommendations without showing them the ads or even the full story. After 10 years it was shutdown in February.

mtthgn.bsky.social•63 days ago

Your site sounds incredible. It sucks that today’s internet is so hostile towards regular individuals wanting to share what they enjoy on their own terms 😞

maddogeco.bsky.social•63 days ago

It taught me about DNS management, Wordpress, SEO, and improved my writing skills, it certainly helped me get a job. It was fun but it was time to move on.

inferknow.co•63 days ago

How were the AI bots giving your recommendations?

maddogeco.bsky.social•63 days ago

When someone searched for a particular cider product. Google generated the summary at the top of the search results.

jjonesyb.bsky.social•63 days ago

posting it in the chat session of the shitty gen ai service

duckduckf.bsky.social•63 days ago

Sounds like we have to go old school and send newsletters, zines, and paper journals

darkswordsman.com•63 days ago

I'm curious if there's any way to detect AI or bot crawlers?

que.one•63 days ago

Isnt wiki fronted by a CDN?

clonalantibody.bsky.social•63 days ago

Put in human verification checks to prevent scraping by AI and other automated bots.

layer8problem.bsky.social•64 days ago

It can't be overstated just how dependent LLMs are on Wikipedia. It's a free resource full of generally reliable information about almost everything, usually curated by experts in their fields.

The models are no doubt built to heavily weigh it and would be much less accurate without the data.

nikkijayne.bsky.social•63 days ago

And yet, the AI summaries still make SO MUCH shit up!

daotoad.bsky.social•64 days ago

It seems to me that their vector database of all their weights are infected by the Wikipedia license.

It’s a true copyleft license and derivative works must be shared.

https://en.m.wikipedia.org/wiki/Wikipedia:Text_of_the_Creative_Commons_Attribution-ShareAlike_4.0_International_License

daotoad.bsky.social•64 days ago

It’s pretty unambiguous:

Share Alike—If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

atypicalhippy.bsky.social•63 days ago

I want to see that court case happen, and if anyone is crowd-funding to run the case I'd contribute.

sanakism.bsky.social•63 days ago

Seconded. If the Wikimedia Foundation could legally stab OpenAI et all in the heart it may be their actual greatest gift to humanity.

veryoddrequest.bsky.social•63 days ago

Block them. Block them wholesale.

christinesayshi.bsky.social•62 days ago

I would like to but how?

odouglasprice.bsky.social•63 days ago

@alt-text.bsky.social

alt-text.bsky.social•63 days ago

Alt text retrieved

keeptheskyblue.bsky.social•63 days ago

LLM technology is the absolute worst and it's still mostly useless. 😔

iainhosking.bsky.social•63 days ago

I donate monthly to Wikipedia. It's such a valuable resource

charbelelhani.bsky.social•63 days ago

IA robs the work of others, devalue workers, destroy valuable things like Wikipedia. Boycott IA as much as you can!

victorytwin.bsky.social•64 days ago

@kint.bsky.social as you were saying!!

shriketron.bsky.social•63 days ago

Maybe they can get into a subscription deal with AI companies to share the bandwidth costs.

Win win for both, as AI is a better search engine, and Wikipedia helps AI with authentic and curated sourcing - helping the people searching.

shriketron.bsky.social•63 days ago

This!
https://bsky.app/profile/antaboga.bsky.social/post/3llt4m3qeq226

mancalledhorse.bsky.social•63 days ago

Simplistic solution that's been tried many times and repeatedly been evaded by the selfish thieves in the AI industry

mancalledhorse.bsky.social•63 days ago

AI is not a better search engine. Wikipedia is already incredibly easy to search both directly within Wikipedia and with any external search engine without needing AI getting in the way. The search engines already have the sourcing in their core data and they've been curating those links for decades

carlnyberg312.bsky.social•63 days ago

Tax AI.

garnet.horse•63 days ago

You know you're the problem when even a company like Cloudflare is working on a service designed to stop you from hiring their customers.

lamb-dev.bsky.social•63 days ago

that's a lot 😵‍💫

babs05.bsky.social•63 days ago

@oxfordastrologer.bsky.social ☹️

ravenwings.bsky.social•63 days ago

I'm just waiting for the other shoe to drop.
Betting actual money that the DDoS situations on online games are not actual DDoS attempts but just that people have shitty ai written code for web crawlers that think there's data to be scraped in a goddamn game server.

radiohusky.us•63 days ago

Tbh there is... In the form of models/avatars

But uh

That wouldn't be a crazy amount of traffic so.... WTF are they even after?

ominouspooltoy.ing•63 days ago

Anything and everything. The more an LLM knows about something, the more accurately it can replicate it.

weary-mermaid.bsky.social•63 days ago

I don't understand this, but I do donate to Wikipedia occasionally. I hope it helps. I hate AI.

khazad-dum.bsky.social•64 days ago

I have never seen or heard anything more overrated than AI.

Fuck me, EVERY time I hit the internet, accidentally glimpse network TV, hear a snippet of radio, or read something in the dead media, someone is DEMANDING I respond to some bullshit about AI.

Just fuck off about AI. Goddamn.

scrispin.bsky.social•63 days ago

📌

davidjayindie.bsky.social•63 days ago

I wonder how those ai bros will enjoy themselves when every piece of media is ai generated. Imagine that. They'll say that's exactly what they want, but I get the feeling once reality hits them, they won't be laughing anymore.

navigatorbr.bsky.social•63 days ago

It's fucking infuriating as an editor to know my work is being stolen *and* it's actively fucking over the project.

It's such fucking bullshit.

ltrippe.bsky.social•63 days ago

Side note: got to continue annual donations to Wikimedia Fdn. Despite its flaws, it’s immensely useful.

willowminder.bsky.social•63 days ago

What else would you expect from parasites?

cameronwilson.bsky.social•64 days ago

I have been writing at @thesizzle.com.au about this a lot: people across the internet say aren't going to put things online or they are taking things offline because AI companies do not give a fuck and are plundering everything without permission https://bsky.app/profile/ketanjoshi.co/post/3lksmahc4vk2b

scrambledmegs.bsky.social•63 days ago

As an artist in animation, why would anyone share their work anymore if it’s just going to be raped by AI and used to make derivative slop. Artists will meet up in person to share work, we already have figure drawing studios, we existed before the internet. Lazy plebs can rot. Make your own memes.

majorpaincake.bsky.social•63 days ago

But I don’t get to see it bc of that 😞

urnash.com•63 days ago

I guess this works if you have a studio gig but I'm a freelance artist. How'm I supposed to get work, fans, and patrons without posting it online?

Fuck AI.

jcastroarnaud.bsky.social•63 days ago

It's time to invent an encrypted means of sharing things online, in such a way that AI bots can't read it - or, ideally, can't even detect it as valid content.

#cryptography #sharing #p2p

merelydovely.bsky.social•63 days ago

anything you can do that blocks AI crawling would block all the useful parts of the internet, like 'ability to copy-paste', 'ability to search', 'ability for screenreaders to read text', 'alt text', 'structural HTML best practices' etc.

jcastroarnaud.bsky.social•63 days ago

I think that a site that hides its contents from bots, and is still functional, is possible, but the resulting #WebApp will be far from trivial. Here's the outline of an idea I had today for such a site: it's a long thread.

(1/n)

jcastroarnaud.bsky.social•63 days ago

Suppose that the site is for art sharing and comments on them, a bit like DeviantArt.

Every new user, signed on using e-mail and password, gets their own section of the site: /user/, and a pair of public/private keys for encryption/decryption.

(2/n)

jcastroarnaud.bsky.social•63 days ago

Suppose that user "claire" wants to see the section of user "shaolin". claire will use the following URL:

/user/shaolin/

And receive shaolin's section as a big encrypted blob:

(3/n)

zuthal.bsky.social•31 days ago

Put a trap in your website that won't be sprung if the bot obeys robots.txt, but that will do the worst things possible to one that doesn't

merelydovely.bsky.social•63 days ago

it's like how you can't prevent music piracy if the piracy's done by simply recording the music when it's played as sound. at some point the information has to become available to the audience, and on the internet, that's everyone, including crawlers.

merelydovely.bsky.social•63 days ago

you could password protect everything, but that just means the people who own crawlers would get the crawler accounts for all the major content sites and keep going. at a certain point you just have to legally mandate respect for robots.txt-style flagging.

callie.rebelwrath.co.uk•63 days ago

I've had spam comments appear on password protected pages on my site, so it seems like some bots at least can just ignore passwords

merelydovely.bsky.social•63 days ago

AI isn't reading your DMs. but we don't come to the internet just to talk in DMs!

shirley0401.bsky.social•63 days ago

Everyone who isn't one of them is an NPC to them.

a-701.bsky.social•63 days ago

AI techbros: "What if we made something that was like if the Borg assimilated the Torment Nexus?"
Assimilate things just to crush all joy within it.

fastbreak78.bsky.social•63 days ago

AI and bitcoin mining suck so fucking bad.

donno13.bsky.social•64 days ago

It’s so hard in a business landscape to have the concerns voiced. Every conference I attend,I can guarantee there will be a session from someone telling me how AI will change my life, boost productivity etc but any discussion around IP theft, environmental impacts etc is always overlooked.

nikkijayne.bsky.social•63 days ago

Oh my god this. I’m a copywriter and my company seems to have finally realised it’s nowhere near good enough to replace the creative team, but all this BS about productivity is no good if your job is anything more than basic spreadsheets and meetings.

nikkijayne.bsky.social•63 days ago

‘It will make you more productive.’ When I ask how and for a solid example, crickets. You can’t even rely on the summing up of meetings because of the hallucinations problem.

ragin-agnst-gop.bsky.social•63 days ago

I refuse to use any LLM AI to do research on the Internet. Using AI for information gathering is like asking some entitled blowhard who thinks they know everything about everything giving me the most surface explanation of something based on what he heard someone else say about the matter at hand.

robertramsay.org•63 days ago

Surely they can ban the AI crawlers - even if it's a bit Whack-A-Mole

josiedraus.bsky.social•63 days ago

At present the whacks are more costly than the moles. Perhaps that balance of power will shift. Hopefully not further in the bad direction.

robertramsay.org•63 days ago

Unless you used AI to spot the AI crawlers 😁

josiedraus.bsky.social•63 days ago

Far more costly than using "A""I" to scrape data from a low-bloat, high SNR website such as Wikipedia. It's almost as if no good deed goes unpunished. What we deserve for living in an entropic universe, I guess.

robertramsay.org•63 days ago

Í was joking, and entropy is very much misunderstood

saridder.bsky.social•63 days ago

The web is still open and Wikipedia is still there, don't post nonsense.

crs1.bsky.social•63 days ago

another tragedy of the commons

we need to send the commons to military preparedness training so it can defend itself

fridayharbor.bsky.social•63 days ago

Nice!

doctecazoid.bsky.social•64 days ago

Surely those crawlers can be identified and blocked ... ?

colbosch.bsky.social•64 days ago

Not anymore. They use basically DDOS tactics, where requests come from a vast number of IP addresses at once. They're specifically designed to bypass any safeguards. Even if you shut down access from an IP range (which in Wikipedia's case means blocking real people) the bots will shift to another.

colbosch.bsky.social•64 days ago

And this isn't one or two entities doing these crawls. Wikipedia can easily handle, say, OpenAI and Grok doing a dive each day. But for every legitimate company, there are literally thousands of bad actors and idiot hobbyists. Their crawls outnumber the "real" AI companies by orders of magnitude.

tonydolan.bsky.social•64 days ago

Yeah it’s basically a fulltime whack-a-mole job to keep them out.

doctecazoid.bsky.social•64 days ago

I guess this is the down side of an open internet, anyone with sufficient smarts and tools can game the system.

le sigh

colbosch.bsky.social•64 days ago

There are ways to combat it, but it would require a significant, multinational effort. And far too many governments have been captured by bad tech actors, are woefully ignorant, like having their own teams of cybercriminals, or some combination of all three.

doctecazoid.bsky.social•64 days ago

One wonders if (when?) the time will come that the internet just collapses in on itself.

colbosch.bsky.social•64 days ago

Over 90% of all internet sites and communications are, to some extent, inauthentic. It's horrifying, and yeah, it's leading to a lot of problems that don't have easy solutions.

wilhanel.bsky.social•63 days ago

It can be done with data analysis after the fact. But trying to identify it just-in-time is nigh impossible.

jsantos.eu•64 days ago

There's an arms race going on between websites and the AI companies, since the latter is doing everything in their power to impersonate legitimate users - i.e. human users and search engine crawlers. And until this kind of behaviour is legally considered fraud, they are not going to change.

doctecazoid.bsky.social•64 days ago

sigh

posivista.bsky.social•63 days ago

It is not artificial intelligence. It is artificial imitation of stolen intellectual property.

maxoakland.bsky.social•63 days ago

This is infuriating

lankfried.bsky.social•63 days ago

The early internet was anarchical with people producing work and creativity without any profit motive. Without oversight, that monetization gap was always going to be filled. It’s the tragedy of the commons all over again and we’re not going to be able to go back to that rules-free temporary state.

lankfried.bsky.social•63 days ago

What we desperately need in the next decade is a comprehensive set of regulations for the internet. Protections and public funding for commons like with libraries in the 20th century and a tax or fee system for crawlers. And a transparent system with profit sharing for monetizing information.

nickwolfebrown1.bsky.social•63 days ago

Always add -ai at the end of your Google search

thmazing.bsky.social•63 days ago

https://www.reddit.com/r/uBlockOrigin/comments/1ct5mpt/heres_how_to_disable_googles_new_forced_ai/

nikkijayne.bsky.social•63 days ago

Or switch to DDG which allows you to turn this feature off.

sluggo55.bsky.social•63 days ago

If you use Wikipedia then please make a donation to support them.

antaboga.bsky.social•64 days ago

So rate limit access to just slightly more than any human operating a keyboard would do and then make automated queries at a faster rate only available via an API which you charge for.

This really is simple and lots of providers of "free" data do this already.

matthewdeaners.bsky.social•63 days ago

Modern scrapers are scraper farm services. They operate thousands of bots proxied throughout the world and are designed to access websites at human-operating rates to avoid detection. How is that simple?

antaboga.bsky.social•63 days ago

Scrapers have a different signature to humans, so having detected one you think is suspect just put up a captcha. Again this is all existing tech in common usage and nobody need to invent anything new.

rspian.bsky.social•63 days ago

Tell that to the open source community who are having to develop new tools to stop them? https://techcrunch.com/2025/03/27/open-source-devs-are-fighting-ai-crawlers-with-cleverness-and-vengeance/

jaoswald.bsky.social•63 days ago

Captchas are not free, they raise barriers for humans, and the AI are getting better than humans at them anyway.

frankieffunk.bsky.social•63 days ago

And a lot of them use AI slop pictures, besides.

matthewdeaners.bsky.social•63 days ago

Bro, there is a massive, massive financial incentive for scraping services to exist. It’s not simple. It’s an arms race.

viperx83.bsky.social•63 days ago

And yet the whole internet has been scraped by AI assholes. Perhaps it’s not as easy as you think?

rfctabs.bsky.social•64 days ago

AI crawler - that's #MuskRat, right?

cybogoblin.bsky.social•64 days ago

Not just him. OpenAI will have their own crawlers, as will any of the other main developers.

jcc333.bsky.social•64 days ago

What’s incredibly silly about this with Wiki specifically is that they offer big archive dumps and the software to read them!

bugwebs.bsky.social•63 days ago

It's not silly. This is just one of those "why we can't have nice things" kind of a situation

pantojacoder.bsky.social•63 days ago

The Wikimedia Foundation is the non profit that maintains the infrastructure of Wikipedia but it doesn’t own it. Nobody does.

aegixdrakan.bsky.social•64 days ago

Gods, GenAi is a scourge.

tonymanzullo.bsky.social•63 days ago

i hope they start adding nonsense on every page either in the footnotes or as transparent text to fk with the AI scraping

marijocook.bsky.social•64 days ago

Then my idiot browser gives me an "AI summary" of the topic I searched, just above the link to the Wikipedia page.

mallardgryph.bsky.social•64 days ago

I know I'm being an annoying Internet dweeb by saying so, but please consider Firefox... it won't do u dirty like this

shriketron.bsky.social•63 days ago

Firefox now includes a side-panel to host an AI you choose to use.

I used Google's Gemini in it. Then the integration lets Firefox ask AI when you highlight any words in the browser.

sheepbop.bsky.social•63 days ago

Ick

fairlytalladam.bsky.social•64 days ago

And from what I've seen those AI "summaries" are frequently guff that combines multiple sources incorrectly and/or draws incorrect conclusions.

profsusan.bsky.social•63 days ago

I asked it to write a paragraph about my career. Version 1 was very general so I asked for specifics. It told me I worked on technology I did not, with people that didn't exist and won awards that didn't exist when I would have been eligible. Basically it got about 20% correct!

heavenlyfodder.bsky.social•64 days ago

So I suppose you're going to tell me the Underground Railroad isn't a literal railroad?

fairlytalladam.bsky.social•64 days ago

Yep. I'm a real fun vacuum.

Which is to say I remove the joy from things, not that I'm an authentic air-suction cleaning appliance that is particularly entertaining and enjoyable to use.

andylew.bsky.social•63 days ago

If you include "-ai" in your search query, you'll get results without the AI summary.

poptart2nd.bsky.social•63 days ago

How do I get the old search results before Google ruined it to sell more ads, necessitating the AI summary in the first place?

kittensceilidh.bsky.social•62 days ago

Actually they made the -ai thing stop working. You can get the ai to ignore your search by using cuss words in it though!

fairlytalladam.bsky.social•63 days ago

I appreciate you, but I'd also get results without the AI summary if big tech stopped trying to shoehorn AI into everything with a fucking on/off switch.

wincingdad.bsky.social•63 days ago

I hate this timeline.

orpach.bsky.social•64 days ago

This is incredibly bizarre.
Wikimedia has APIs for doing this properly, and torrents (with web seeds) of pretty much all the content.

There's an easy and correct way to do this that's being ignored for some reason.

campriceaustin.bsky.social•64 days ago

100% this, it’s weird as hell. They publish the damn snapshots.

atypicalhippy.bsky.social•63 days ago

I imagine there would have to be a process for updating without re-downloading everything?

dinorocket.bsky.social•63 days ago

Because either A: the people making these really don't know what the fuck they're doing it B: they just don't give a shit.

It's B. We all know it's B. Although most of them are probably being run by A as well.

zefiris.bsky.social•63 days ago

It think it's C

Hurting wikipedia and other sites is intentional. They're competition. Why wouldn't human filth like Altman destroy competing sources of information if he can? Sure, it'll screw AI in the future, but why does he care, he'll have robbed billions by then. Burn and loot.

maskfairy.bsky.social•63 days ago

Wikimedia has those API’s and torrents, but I’m guessing most websites don’t. These scrapers don’t want to have to do something different for a wiki site than they do any other website, that would be more work.

renee-draws.bsky.social•64 days ago

What do you mean? APIs don't negate the cost of hosting or traffic.

orpach.bsky.social•64 days ago

It's way less than repeatedly scraping pages or hotlinking.

neonchinchilla.bsky.social•63 days ago

It's obviously time to kill AI.

🫡

julievegan.bsky.social•63 days ago

How can we help? What can we do?

concernedobserver.bsky.social•63 days ago

We need a philanthropist/benefactor to merge Wikipedia with an independent AI. It would be a valuable tool for research and education.

hughster.bsky.social•63 days ago

Generative AI is the worst possible tool for research and education. It hallucinates facts routinely, it's always out of date, it has no understanding of context, etc.

If you want a tool you can talk to, all you need is straightforward voice recognition to look up and read from Wikipedia pages.

concernedobserver.bsky.social•63 days ago

AI has already made huge breakthroughs in the modelling of protiens 3d structures and identified novel therapeutic drugs. There are many applications to use it as a tool in research and many in education as well a few that come to mind are grading, creating custom curriculums and resource creation.

mancalledhorse.bsky.social•63 days ago

No we don't and no it wouldn't.

inferknow.co•63 days ago

Wikipedia is offered up as a full download. I’d think that’s preferable to scraping it page by page. Seems strange.

inferknow.co•63 days ago

Ahhh. It’s for images - that I’d buy.

jdwear.bsky.social•62 days ago

I guess it would at least help if we donated to the foundation. I am shy about recurring donations because I tend to lose track of how many I'm in for, but I just did a one-time: https://donate.wikimedia.org

wordilocks.bsky.social•63 days ago

I bought some Wikipedia merch recently, a travel mug and Tshirt. :-)

fridayharbor.bsky.social•63 days ago

EFF article on the topic.
This looks like a major problem with folk's sites basically being hijacked including their bandwidth. A cost that ai is not paying for... And no body to incarcerate...

https://www.eff.org/deeplinks/2025/03/eff-thanks-fastly-donated-tools-help-keep-our-website-secure-0

fridayharbor.bsky.social•63 days ago

Is there a work around?

zuthal.bsky.social•31 days ago

Websites need to push back against those crawlers. Idk, put a malicious file that will crash the crawler in a directory that human users won't access and that is forbidden by robots.txt? That should only hit bots that deserve it.

kanna.zip•63 days ago

Friend of mine had to shut down their website because their website was basically being DDoS'ed by crawlers despite their robots.txt specially restricting it. They had reached their data cap 3x over.

wishcandy.bsky.social•63 days ago

Yep! I was looking into hosting my own website because squaresp4ce uses a///iii and you can’t opt out of all of the crawlers. Web designers and programmers said a///i scrapers take up too much bandwidth and bills have skyrocketed. Lot of people’s websites going down from illegal scraping

wishcandy.bsky.social•63 days ago

I basically no longer have a web portfolio, and an e-commerce site alone is costing the same if not more than having scrapey squaresp4ce. I hate all of this

mikecook82.bsky.social•63 days ago

Is there not some kind of technology that can guard a site against AI crawlers? If not, someone should invent that shit quick. I would also love some kind of browser and social media filter that blocks AI generated text and images from display. Get to inventing smart people!

catovitch.bsky.social•63 days ago

Problem is, at least one AI company seems to be using a distributed bot net of hacked machines with randomised client identifiers to make it very difficult to block all their requests.

It's time lawless, thieving AI companies were brought into line, i.e. jail.

https://www.mythic-beasts.com/blog/2025/04/01/abusive-ai-web-crawlers-get-off-my-lawn/

mikecook82.bsky.social•63 days ago

I’m not a techie, so all this shit is beyond me. I just think Gen AI, as it’s been deployed, is a tool of fascism that serves no purpose but to destroy human expression, creativity, knowledge, and education. Not to mention jobs/environment. It’s bad and I hate it. Can’t wait till the bubble bursts.

rspian.bsky.social•63 days ago

There's a standard where you can specify which web crawlers are allowed to access which pages, but the AI scrapers ignore it.
https://en.wikipedia.org/wiki/Robots.txt

It's hard to block botnets if they are using IP addresses of legitimate users, and each IP only makes a few requests.

btwnthesepages.bsky.social•63 days ago

📌

trythisagain.bsky.social•64 days ago

They should manage crawler access better. it isn’t that hard to do as an admin.

atypicalhippy.bsky.social•63 days ago

That's particularly silly given that you can download the whole of wikipedia without crawling it. I don't know if you can get that from torrents, but it would make sense.

goldavelez.org•63 days ago

yeah proof of humanity and some gating is unfortunately increasingly necessary

walflour.bsky.social•62 days ago

This might help
https://anubis.techaro.lol/docs/design/why-proof-of-work/

josephcappadonia.bsky.social•63 days ago

Huh..
I don't understand

johnweeks.bsky.social•64 days ago

@jesseorndorff.bsky.social

free-bird.bsky.social•64 days ago

Can't Wikipedia offer a licensing fee to these companies? That's a thing right?

caradelaney.bsky.social•63 days ago

They could, but why would those companies pay for something they can just take for free?

There was zero reason Meta had to pirate all those books. They could've at the very least bought each of them once. They chose not to. Actively decided to torrent them instead.

They dgaf about licenses.

mancalledhorse.bsky.social•63 days ago

Wikipedia content is freely licensed and so would need the consent of all the copyright owners (ie millions of individual editors) to change the licenses to allow WMF to charge for commercial use. Wikipedia terms of service prohibit what the crawlers are doing. AI firms don't care about compliance.

cybogoblin.bsky.social•64 days ago

Why would an AI company pay a fee when they can clumsily scrape the data for free?

Comments

Posting Rules

Reply