Training LLMs on everything was not a”mistake” but a way for unethical companies to exploit unpaid labour and make a lot of unearned profit. While burning through earth’s resources and not giving a damn how this affects less privileged people
Comments
Log in with your Bluesky account to leave a comment
I keep hearing how companies are getting rich off AI and also how companies are going broke because AI isn't useful for anything, often from the same people talking about the same companies.
(To be clear, I'm not saying companies are actually good. I don't trust companies, which is part of why I'm in academia. I'd rather talk about science openly than develop it in secret and stop others from using it via secrets/patents/regulations.)
100% and it happens structurally, not necessarily with engineer intent.
E.g. Eng may be given tools to scrape, store, analyze Internet at scale but no $ for licenses nor any mention, accountability, or even path to discuss such.
The outcome is to use what they are given, i.e., tools for theft.
Then, when the systems prove "useful" (for whatever definition the company adopts) they are declared to be too large & important to change, and are instead part of the cost of "innovation".
It probably depends a little on the nature of M. If M is an LLM only, the exploitation is probably foremostly imperialistic and/or settler. If it is GAi, ie actually intelligent, then the ethics become complicated by its sentience surely? I don’t think we should hurt it, for example.
But assuming it is “unexplainable decision making”, then the answer is likely going to apply to a means/ends distinction. If it develops a proposal to solve a significant problem like poverty, and the outcome is applied, and it works, then probably it’s fine. If it is used for profit only, then
I think personally it all falls within fair use and I suspect/think that plaintiffs are going to have a very difficult time showing otherwise. Beyond the legal question, however, I think the notion that material available for free on the internet can be stolen is a sort of odd claim.
AFAIK, copyright (and moral obligation to authors) does not apply only if a work is sold or paywalled.
Beyond that, there are 100ks or Ms of copyrighted *paid* works in training databases. At least one of my books among them, according to the Atlantic's database. Was never made freely available.
Fair use applies to everything. Universities would basically be unable to function if it didn’t. And I think proving that fair use was violated will be damn hard. The law review articles on this topic build analogous cases across multiple systems. My gut says almost no one will be paid.
Last I checked, a corporation copying my entire book, for a profit making system, replicating much of its purpose, would not count as "fair use" in current understanding.
I do agree that the principles are often violated, perhaps even equally violated by academia and industry ... and that courts may redefine them.
Meanwhile I remember with appreciation (philosophy) profs who explicitly acquired rights to chapters & articles for their photocopy packets in grad school
But that’s going to be the question: did they copy it? Because converting an entire book/corpus/archive to probabilistic relationships between words and then eliminating all information except the probabilities doesn’t conform to a definition of “copy” in a way that most of us think.
Less I be misunderstood: there’s a distinct justification for saying, as Jonathan Frazen once did to Oprah readers, “I didn’t write it for you”, and thinking that there’s an injustice here. But claiming it is theft is another matter and seems extreme.
My own preference would be to use the internet archive model and make all information as free as possible by creating libraries like that organization tried to do. Arguably one of the worst problems people face is much good knowledge is behind paywalls.
This point you are making is generally true to our relationship to technology (this is what technology is ontologically), which perhaps why life in 2024 is shallow. The internet of things is another example; Ai isn’t unique; I believe it just shows us the nature of technology progress as an ethics.
You are a historian. Maybe look into the history of AI winters, hype cycles Etc. AI has been going through this more often. Only the scale of resource burning is new(ish)
Comments
E.g. Eng may be given tools to scrape, store, analyze Internet at scale but no $ for licenses nor any mention, accountability, or even path to discuss such.
The outcome is to use what they are given, i.e., tools for theft.
The answer shows whose perspective in centered.
Exploitation is useful for the exploiter and the beneficiary, not for the exploited.
That explains a lot. Bowing out.
Beyond that, there are 100ks or Ms of copyrighted *paid* works in training databases. At least one of my books among them, according to the Atlantic's database. Was never made freely available.
Authors and publishers were perhaps allowed the benefits and expectations of copyright and attribution ... until more powerful forces had an interest.
Fails 2/4, arguably 4/4, of tests here: https://www.copyright.gov/fair-use/
Universities do buy books and journal subscriptions.
Meanwhile I remember with appreciation (philosophy) profs who explicitly acquired rights to chapters & articles for their photocopy packets in grad school