I’m tempted to write a tutorial on how to train the fully copyright-free LLM on your own content, just to see the shitstorm that will inevitably ensue when a workflow that allows ethical LLM training exists. - ThreadSky

About ThreadSky

minimaxir.bsky.social • 19 days ago

I’m tempted to write a tutorial on how to train the fully copyright-free LLM on your own content, just to see the shitstorm that will inevitably ensue when a workflow that allows ethical LLM training exists.

Comments

timothymalstead.com•19 days ago

I would like to read that and I'd also like to read about an easy way to the same for image generation trained on my own art.

I would definitely want them to run fully locally, if possible.

asura.dev•19 days ago

Just start with the olmo models and you're 80% done
There's literally a tutorial there

I keep wanting to make a similar article but just can't be bothered... mostly because I know the replies will be full of people losing their absolute shit claiming it stole their art or whatever

minimaxir.bsky.social•19 days ago

The bottleneck is the 7B size, however: I'll have to wait to see if someone can successfull train/distill a much smaller model.

"Open this Colab Notebook and load the 7B model in 4bit mode and train a LoRA" might be too much friction.

alex.barcelona•19 days ago

Do it! If you can build on what others have done already, like @ai2.bsky.social, and Pleias, even better! For those who don't know them:

https://allenai.org/blog/olmo2-32B

https://huggingface.co/collections/PleIAs/common-models-674cd0667951ab7c4ef84cc4

benjaminjriley.bsky.social•19 days ago

I'd be interested in reading that, without aiming to stir up a shitstorm.

loreleyse.bsky.social•19 days ago

I would read that!

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply