Pro-AI types like myself like to distinguish learning/assimilation from verbatim reproduction, and I think we’re right that LLMs mostly do the former. But I wonder if we feel that learning itself poses no threat because humans have never been able to assimilate several million books. + - ThreadSky

tedunderwood.me • 5 days ago

Pro-AI types like myself like to distinguish learning/assimilation from verbatim reproduction, and I think we’re right that LLMs mostly do the former.

But I wonder if we feel that learning itself poses no threat because humans have never been able to assimilate several million books. +

Comments

gabriel-solis.bsky.social•5 days ago

Ok, also, on a separate line of thinking: there’s a really lot of money in tech, and yet the big innovations have all led to artists being paid less. Streaming music is the clearest example, but not the only one. Why, actually, couldn’t Meta have paid people for their writing to train the LLM?

gabriel-solis.bsky.social•5 days ago

People would have, in large numbers, have been delighted to get a check. Done and done. Problem avoided. It’s garbage to argue they shouldn’t have. Like, buy the book. That’s what people do. Why not? Consider it an investment.

tedunderwood.me•5 days ago

Selfishly, as a researcher, I would hate it if they did that, because it establishes a precedent that could really weaken my own ability to claim that text-mining / modeling thousands of in-copyright books is fair use.

I would still be able to say “well, my use is nonprofit,” but it’s a weaker +

adarel.bsky.social•4 days ago

But that's what fair use is *for*. You are downplaying your own argument, in that it isn't just nonprofit, it is specifically for non-commercial research to advance knowledge and understanding. OpenAI is commercial research to advance profits with the side effect of some (not open) gain otherwise.

adarel.bsky.social•4 days ago

OpenAI standing in for all of the above. Fair use is a grey policy because someone has to decide what is a "fair" use, and personally, I do not think that it is "fair" to use other people's work for one's own profit without compensation.

tedunderwood.me•4 days ago

I assume you already know about the four factors, but will provide a link for others:

https://copyright.columbia.edu/basics/fair-use.html

I don’t think they add up to a simple escape clause for nonprofits. Eg if Meta establishes that there is a market for “training AI on a corpus” it tends to support +

tedunderwood.me•5 days ago

position if that’s the only reason I have for claiming fair use.

Les selfishly, I also think it’s a bad precedent to establish for the future of open inquiry that all intellectual content is property and any form of learning from a book requires a payment to the creator.

gabriel-solis.bsky.social•5 days ago

I hear that and appreciate the point. I also remain stuck with the important differences between big corporate tech and you. Not in a systematic way, I guess, but also in a way that seems important. It deeply sucks that Spotify already doesn’t pay 99% of artists anything like what they should.

gabriel-solis.bsky.social•5 days ago

And it sucks worse that they now make a ton of AI junk music by mining copyrighted libraries and promote it over music they would sort of be expected to pay for. And it sucks that lots of people don’t notice. Lots of bad to go around there.

tedunderwood.me•5 days ago

Yeah if I could legislate this, I would create a world where mass verbatim reproduction is hard and has to be remunerated more than Spotify, &c do —

but would also keep more diffuse forms of learning free & fair use

But technology & law don’t care at all about my moral feelings.

aaronsterling.bsky.social•4 days ago

The big learning issue I see is lack of need to train young talent. Coding with Claude feels a lot like teaching a junior. The model is trained on mediocre code. I've added a ton of guidance so it will default to advanced patterns. But I only type that once, instead of explaining to multiple people.

tedunderwood.me•5 days ago

If individual people could assimilate, and retain, close to everything written, we might have very different instincts about credit and remuneration.

Instead of trying to compensate people whenever they’re echoed (i.e, always), we might focus on the (rarer, clearer) moments of actual innovation.

alxfed.bsky.social•5 days ago

And then there is a 'rote learning', that is rewarded more than generously in the education system...

bschmidt.bsky.social•5 days ago

I think this is based on a moralistic view of remuneration? We compensate people for their work not because they deserve it based on the work but because compensating people for work is the best way to create more work in the future.

bschmidt.bsky.social•5 days ago

Like we could cut all funding to the NSF and make each Nobel prize $1b dollars. (No really, we could…) but that would produce less scientific research than paying people to do work on spec?

bschmidt.bsky.social•5 days ago

Otherwise it’s “from each according to his abilities, to each according to his abilities multiplied by his luck”

tedunderwood.me•5 days ago

I agree. For this reason you probably wouldn’t actually want to pay people strictly for success—but do what granting agencies currently do and support plausible *plans* for innovation. Novelty of the proposal could still be an element.

mm-jj-nn.bsky.social•5 days ago

sort of supporting that hypothesis: in areas of creative production with a narrow enough range that a human *can* assimilate a large part of it, people *do* start doing this kind of ownership/plagiarism discourse, e.g. trying to claim ownership of specific combinations of genre tropes

tedunderwood.me•5 days ago

I guess I’m positing that if we actually knew when innovation happened, we would directly pay people for (the rare moments of) innovation instead of trying to give them ownership / a right to charge for echoes.

tedunderwood.me•5 days ago

Basically the academic system.

afamiglietti.bsky.social•5 days ago

We've discussed this before, but this goes back to my sense that you used to need human labor, a lot of it, to make verbatim copies of text. It was only after machines could do that work that we got copyright (and that process took about 2 centuries)

tedunderwood.me•5 days ago

(And if we could actually remember several million books, we might have a clearer idea of when a new text does actually stretch the outer boundary of the possibility space.)

gabriel-solis.bsky.social•5 days ago

I’m not sure if we have a useful definition of originality (/innovation/creativity/etc.) to work with wrt art-making that has any bearing here. I’m sitting in a bar right now, hearing The Specials. I know I love this recording, but it’s not because it is original.

gabriel-solis.bsky.social•5 days ago

It is probably innovative in some sense, but not a very strong sense. It’s creative, but only in the way that anyone making anything specific exercises creativity. All of that sounds like I’m saying it’s not very good as art, but I am 100% not saying that.

gabriel-solis.bsky.social•5 days ago

I imagine I could hear a thing made by an AI and like it as much as I like this Specials song, but I can definitely confirm that I have not had that experience. I have thoughts about why that is, and some people might dismiss them as “magical,” but I don’t think that is it.

gabriel-solis.bsky.social•5 days ago

Also, goodness knows, I know artists who use machine learning as a tool in making art. And some of it is great! (Some of it isn’t, but that’s the way of all things). But notably, the machine bit of that art is incidental. It’s technical.

sarahebull.bsky.social•5 days ago

I think I agree with you, though given how much time and energy has gone into defining what should count as an ownable arrangement of words, I think it would still be quite difficult to do this. I also wonder whether we'd (say we) value innovation as much in a different context. "Originality" was +

sarahebull.bsky.social•5 days ago

something people began to cite to gain copyright, but they wanted it because they wanted to be compensated for their labour or for tying up capital (and to own property that had proven valuable). Those conditions seem to matter here.

Neat thought experiment! Lots to think about.

catblanketflower.yuwakisa.com•5 days ago

Like, what do you do having a conversation with a being that just casually throws this out

tedunderwood.me•5 days ago

Yeah there’s a tendency to fetishize “intelligence” and “reasoning,”—and to condescend a little to mere “knowledge.”

But knowledge also counts. The sheer *breadth* of LLMs is a place where they are already superhuman and may be opening new perspectives.

Comments

Posting Rules

Reply