Meta considered licensing books to train AI—but opted instead to pirate LibGen, a database that currently contains more than 7.5 million books and 81 million research papers, Alex Reisner writes. - ThreadSky

theatlantic.com • 7 days ago

Meta considered licensing books to train AI—but opted instead to pirate LibGen, a database that currently contains more than 7.5 million books and 81 million research papers, Alex Reisner writes.

Comments

metaphysicalist.bsky.social•7 days ago

rayme77.bsky.social•7 days ago

Everyone needs to read that Meta book “Careless People” these people are fucking insane.

steblin.bsky.social•7 days ago

Fuck Zuck, Fuck Meta

atombomb05.bsky.social•7 days ago

AI is one big copyright violation.

janitcalvo.bsky.social•5 days ago

Pick me too - class action anyone? It’s got both of my books in there. #asshats

belindatan.bsky.social•7 days ago

@thesiswhisperer.bsky.social @katherinefirth.bsky.social @patthomson.bsky.social you’re on here 😞

patthomson.bsky.social•7 days ago

Yep. I’m up for the class action :)

thesiswhisperer.bsky.social•7 days ago

Oh yeah, I figured already. They've all eaten thesiswhisperer too

belindatan.bsky.social•6 days ago

😵‍💫

kshap.bsky.social•7 days ago

Class action lawsuit anyone?

bpw87544.bsky.social•7 days ago

Doesn't California have a three strikes law? Can it be used to put some meta execs in prison for life?

kellumdander.bsky.social•7 days ago

First they came for the musicians, but I said nothing, because I saved money.

haskell.bsky.social•7 days ago

lol how do you "pirate" Libgen?!

dentonrg.bsky.social•6 days ago

Can anyone explain how this might fall under the "fair use" exception to the general requirement that users of copyrighted works obtain consent from the original authors or creators?

childofapollo03.bsky.social•5 days ago

Through lies and high paid lawyers. There is no rational excuse. There is only greed

dentonrg.bsky.social•4 days ago

Ok, ya, thot so

auntie-germaine.bsky.social•7 days ago

Fair use, my ass. Outright theft.

kiwiauggie.bsky.social•7 days ago

Why pay for it when you can steal?

theatlantic.com•7 days ago

Search the LibGen database here, and peer inside a pirated library of millions of books and research papers used by Meta and others:

azadkinsiii.bsky.social•4 days ago

Found one of mine in there.

michaelcreel106.bsky.social•7 days ago

Hey, where are my royalties?

slfisher.bsky.social•6 days ago

I'm not in it, but Adolf Hitler is.

133 times.

pauladlennon.bsky.social•7 days ago

Meta crooks have pirated my books. I wonder when I'll get my payment 🤔

mansen.bsky.social•7 days ago

The real takeaway is definitely that corporations seem to get away with it under defense of "just training AI"

It's never been a surprise that someone, somewhere scanned every book ever and put it online for others to collate in a database.

shaindelr.bsky.social•7 days ago

😬

melo-gt22.bsky.social•5 days ago

@stephenking.bsky.social

alisonmorton.bsky.social•7 days ago

Damn! Most of mine, too.

purlpurl.bsky.social•5 days ago

hope everyone reading this article know that The Atlantic partners with OpenAI which's been *aggressively* lobbying US government to classify AI training on copyrighted data as "fair use." Had commented TA and asked to clarify, but they didn't bother to reply. Best wishes to their partnership!!👍

robb.doering.ai•7 days ago

https://archive.org/stream/GuerillaOpenAccessManifesto/Goamjuly2008_djvu.txt

F12 ✊

torrleonard.bsky.social•7 days ago

https://bsky.app/profile/authorsguild.bsky.social/post/3lkt77cwd2c2m

marjoriemorgan.bsky.social•6 days ago

📌

kokorogensou.bsky.social•7 days ago

oh so when i wanna download pirated books its "illegal", but when genai does it it's perfectly fine

fuck meta

sheelaghcaygill.bsky.social•6 days ago

No, it's illegal when Meta does it, and some authors and publishers are suing Meta, e.g.
https://www.reuters.com/technology/artificial-intelligence/french-publishers-authors-file-lawsuit-against-meta-ai-case-2025-03-12/

glitchsister.com•6 days ago

how is actually illegal if they never stop doing it even after they're told to stop, and keep making money after being fined?

cayemarsh.bsky.social•5 days ago

It MAY be technically illegal. But it certainly is not functionally illegal.

glitchsister.com•5 days ago

just like a fried bologna sandwich

averym.bsky.social•6 days ago

To be precise, when an ordinary citizen does it it's felony copyright infringement and you can go to prison for up to five years and pay up to $250,000. When a Facebook employee does it, it's a civil action only.

sheelaghcaygill.bsky.social•5 days ago

You need to remember that not everyone on Bluesky lives in the U.S. ( thank goodness).

mementomori4950.bsky.social•6 days ago

AI is not functional or viable without stealing from others, remember the lawsuits against Google by the News outlets who weren't getting paid from a billion dollar company. This time they are stealing from every contributor and writer.

carolini.bsky.social•5 days ago

Thinking about Aaron Swartz... Fuck Meta!

zafodbeeblebrox.bsky.social•7 days ago

I have to ask, does the reporting value of "naming" libgen justify the number of people who might be encouraged to use it themselves after reading the article?

I don't subscribe to the ides that training AI on copyrighted works is "theft," but on STOLEN works, yeah, that's theft.

rachelbc.bsky.social•6 days ago

Can anyone share the link to the metadata search mentioned in the article? It isn’t visible for me…

brianjsawyer.bsky.social•3 days ago

Oh, look, there are my books.

blackbirdblues.bsky.social•4 days ago

To me, piracy and copyright infringements are not the main problem.
My concern is what AI does with the content. The key word here is decontextualisation.

blackbirdblues.bsky.social•4 days ago

Every work has a historical context reflected in the personal expression of the author. They are unique, should be interpreted in the context of their creation. From a democratic perspective, the question arises as to how the AI user should interact with authors who are no longer recognisable.

laverneousdonkler.bsky.social•7 days ago

Class Action for those authors.

kanvan.bsky.social•7 days ago

‘Pirate tech bros steal intellectual property’ is an evergreen storyline.

devislaskar.bsky.social•6 days ago

Three of my books

keyerose.bsky.social•7 days ago

“Mark Zuckerberg is a traitorous, self-hating, f*ckstick…

but I don’t mean that in a bad way…”

Dom Irrera might agree.

birddroneone.bsky.social•7 days ago

Everyone should not there are two separate issues here:

AI companies are going on about whether training is “fair use”

But they’re silent on the fact that they acquired the books illegally.

The latter makes it an open and shut case. If you steal, nothing you do with stolen property is “fair use”

sheelaghcaygill.bsky.social•6 days ago

1. I don't agree with Big Tech stealing content from pirate platforms to train gen AI (this appears to be the new norm in the U.S. - do what you want, don't ask, deal with the consequences later). However, it's important to understand their arguments in order to fight back.

sheelaghcaygill.bsky.social•6 days ago

2. Big Tech says its use of content from book/publication pirate platforms constitutes "fair use" (an illogical argument because pirate platforms have already faced lawsuits); Big Tech's argument is what's before the U.S. courts.

sheelaghcaygill.bsky.social•6 days ago

3. In OpenAI's case, they are pushing this argument hard, saying the U.S. will fall behind China in the "success of democratic AI" if they are not allowed to continue taking content without permission or, it seems, compensation to authors/creators.
https://futurism.com/openai-over-copyrighted-work

misskayrenee.bsky.social•7 days ago

we’re all just working to feed The Machine now 🤖

“Soylent green IS people!”

intlsmkgjct.bsky.social•7 days ago

It can’t be said enough: the business model for AI is theft and plagiarism.

zafodbeeblebrox.bsky.social•7 days ago

Craig, did you ever read anything written by someone else?

How did you compensate them for "training" yourself on their work?

(To be clear, I agree that acquiring content illegally, e.g., via libgen, is theft. I do NOT agree that training AI on legitimately-acquired works is theft.)

alanpbourke.bsky.social•6 days ago

The difference is the industrial, auomated scale, also Craig often paid for what he read.

zafodbeeblebrox.bsky.social•6 days ago

The industrial, automated scale of backhoes has essentially put human ditch-diggers out of work. Is that fair? If LLMs do the same for writers, why is that different? The fact that some people don't like progress because it threatens their jobs doesn't mean that progress is wrong.

alanpbourke.bsky.social•6 days ago

Backhoes are better at digging ditches than humans. Systems regurgitating content based on algorithms and presenting it as fact, without any actual intelligence and fact checking being involved, aren't better than humans. Backhoe companies aren't making money off the back of ditches dug by humans.

zafodbeeblebrox.bsky.social•6 days ago

If the algorithms aren't better, then you have nothing to worry about. Their inferior product will not make any money for anyone. And if they make no money, it obviously won't be "off the backs" of anyone.

Incidentally, do you believe human fact-checking is more reliable? It isn't.

authorkelex.bsky.social•6 days ago

The problem with your argument is that 99.9% of humans can't spit back an entire literary work word-for-word. A computer can. Training AI and human learning are apples and oranges and comparing the two isn't any defense.

zafodbeeblebrox.bsky.social•6 days ago

And your problem is that Generative AI DOESN'T DO THAT!

Try it. I did. I took a NYT article from a few months ago. I converted its title + subtitle, VERBATIM, into a ChatGPT Query. I have no doubt that ChatGPT "read" the NYT article, yet its output was not even close to plagiarism.

. . .

zafodbeeblebrox.bsky.social•6 days ago

. . .

I challenge you to use any AI system to produce anything that would qualify as plagiarism. These systems draw from numerous sources, just as a human author would, but they produce unique text, just as a human author would.

When you write something, where does the underlying . . .

zafodbeeblebrox.bsky.social•6 days ago

. . . knowledge come from? Do you make it up? Hopefully not. Most likely a great deal of it comes from READING the works of others. Maybe some comes from direct observation, but when you PUBLISH that information, it then becomes PUBLIC - something any person or AI can read and learn from.

djoyousone.bsky.social•6 days ago

Purchase the literature and read it OR visit the library (who purchased it) and read the books.

luhli.bsky.social•7 days ago

usually, i buy the literature i read

zafodbeeblebrox.bsky.social•7 days ago

"Usually?"

I always do, unless it's a loan from a Library or a friend.

I agree that ACQUIRING a work improperly is theft. But if the work is acquired properly, training an AI system on it is not.

luhli.bsky.social•7 days ago

“unless it’s a loan from a library or a friend”

so you also *usually* do it, because sometimes you loan/borrow it (or, i would imagine, receive it as a gift) just like me 😅

the whole point of the article here is that genAI is training itself on tons of pirated work, so, theft

zafodbeeblebrox.bsky.social•7 days ago

And to repeat myself AGAIN, I AGREE that using pirated work is theft, whether it's for a person or a computer.

Anyone training an AI system should have to legitimately acquire one copy of any work used for the training. But I break with the zealots who say it's STILL theft even then.

intlsmkgjct.bsky.social•3 days ago

The backhoe argument misses the point. If your backhoe needs to use my property to operate, it either has to pay me for the use, or it doesn’t get to operate.

intlsmkgjct.bsky.social•3 days ago

If I use a creator’s work to make something new—particularly for commercial purposes, I must license it (pay for) and credit that work. I expect these companies to do the same. If I failed to do that, I would be stealing.

intlsmkgjct.bsky.social•6 days ago

It is simply not the same as an individual reading a book and remembering whatever bits they remember. Can you read a million books in seconds and retain everything?

nefariousdro.bsky.social•7 days ago

The fact that kids were getting sued for sharing a few songs while Meta has yet to face any consequences shows that copyright law in the U.S. is a joke.

crystaldl.bsky.social•7 days ago

And not a single thing will be done to stop them, compensate the authors they stole from...absolutely nothing

paccali68.bsky.social•7 days ago

Hoping that it’s possible to absolutely bankrupt Meta from the mega lawsuits heading its way!

drlansdowne.medsky.social•7 days ago

World Court, EU, please step up!
Someone?

jmrey0604.bsky.social•6 days ago

@bostonjoan.bsky.social Did you see this?

soblue2024.bsky.social•7 days ago

More concerned that the Dogebags have taken every bit of data from every individual, business, government entity, education, health, science, justice…for Musk’s private AI company.

robotblue.bsky.social•7 days ago

Shocking

bradsafarik.bsky.social•7 days ago

This article is from 2013. A mere decade ago, this type of rampant piratry of intellectual property would have been years in jail. Now it's a tech bro right to steal and profit while stepping on necks. And the DOJ sends protection to Silicon Valley. What a world.
https://nymag.com/intelligencer/2013/01/jstor-hacker-aaron-swartz-commits-suicide.html

minophis.bsky.social•6 days ago

Comments

Posting Rules

Reply