Not quite, and in the Openreview version the authors have since tried to add some clarifications to that effect (e.g. replacing "Autoregressive models" with "Large language models"). But it seems that some skepticism remains among the reviewers https://openreview.net/forum?id=RDFkGZ9Dkh
Also in the abstract! While a cool observation, it's not particularly practical that one's state space has to grow exponentially for each request to the model-modeled-as-markov-chain
Show me something that isn’t a Markov Chain and I’ll show you that it is, except it has a transition matrix of rank > the number of atoms in the universe and therefore modelling it as a Markov Chain is a waste of time.
This number is huge, but **finite**! Working with markov chains in a finite state space really gives non-trivial mathematical insights (existence and uniqueness of a stationary distribution for example...).
This equivalence between LLMs and Markov chains seems useless, but it isn't! Among the contributions, the paper highlights bounds established thanks to this equivalence, and verifies the influence of bound terms on recents LLMs !
I invite you to take a look at the other contributions of the paper 🙂
Hi @vickiboykis.com, thanks for your interest. Don’t hesitate if you have any questions on the paper, we would be happy to help with @ozekri.bsky.social :)
I'm fascinated to see how this goes. I haven't actually read the article yet (but I will, promise!).
But could this be said to be supporting the more skeptical predictions for the state of AI?
i.e.: that it is unlikely for us to expect them to get exponentially better based on the results so far.
I cannot express how excited I was seeing that! I had been looking into markov processes awhile ago. And then with some recent looks into LLM sample methods got me thinking these are just some swole markov processes. Or at least empirically we could maybe treat them that way
I feel like we've really come full circle with transformer equivalency now. Transformers are hopfield networks are Markov chains are CNNs are Transformers etc...
Comments
I invite you to take a look at the other contributions of the paper 🙂
But could this be said to be supporting the more skeptical predictions for the state of AI?
i.e.: that it is unlikely for us to expect them to get exponentially better based on the results so far.