the tldr if anyone is unfamiliar is that to store something with maximum efficiency, you have to understand it. things you understand better are more predictable, so you need less information to recreate them if you understand them better. there is a lot of work on these lines. chatgpt built on this
lossless over some training data? yes. lossless over training data that covers most things you care about? probably. over the string you are going to send it, which probably did not exist when it was trained? likely not
You can use an LLM to implement a lossless compressor for arbitrary text, see e.g.: https://bellard.org/nncp/
The encoder and decoder share the same predictive model, so the encoder can spend fewer bits on more predictable symbols and more bits on less predictable ones
you can just overtrain a standard decoder-only on a single block of text until it's easily capable of running the entire thing verbatim with max sampling and it will have compressed that text
the hutter prize fucking rules and marcus hutter deserves hall of fame status for putting his money on "intelligence is compression" in like 2005, way way downcourt.
it's extremely hardware constrained however, so it doesn't measure current state of the art in llms or similar
The scoffing I see is more of "this isn't state of the art, it's just its own thing now" than an attack on the merits, so this checks out and Hutter needs his jersey retired
I was legitimately talking about human memory and comprehension as just heuristic compression during an informational interview I had last week!
What do folks think they're doing when they speak or listen? In speech therapy it's literally called encoding and decoding as it concerns reading/writing
If you can guess the pattern to the digits ...627262464195387 and extend backwards reliably, you both learn about the rule and happen to compress the string
That helps. But I’m so unversed in the philosophy and discourse of…digitalia that I’m going to have to start way back. Has anybody written clearly for the layperson on these ideas?
the core idea is information theory which is its own field and where there is a lot of math and not a lot of "state clearly the core ideas in english that normal people can read", this stuff is just ... biting the bullet super hard that information theory describes how minds work and using it for ai
Comments
"Intelligence is just compression"
(I don't know if I believe this but… I don't not believe it?)
The encoder and decoder share the same predictive model, so the encoder can spend fewer bits on more predictable symbols and more bits on less predictable ones
it's extremely hardware constrained however, so it doesn't measure current state of the art in llms or similar
right end is "ai is just compression, and that is what intelligence is"
What do folks think they're doing when they speak or listen? In speech therapy it's literally called encoding and decoding as it concerns reading/writing
this does make me realize there is a gap here though, thank you
if you want to know what i mean, see e.g. this which is roughly our canonical reference: http://prize.hutter1.net/