This seems like… what we started with, no? arxiv.org/abs/2410.02724 - ThreadSky

vickiboykis.com • 86 days ago

This seems like… what we started with, no? https://arxiv.org/abs/2410.02724

Comments

Not quite, and in the Openreview version the authors have since tried to add some clarifications to that effect (e.g. replacing "Autoregressive models" with "Large language models"). But it seems that some skepticism remains among the reviewers https://openreview.net/forum?id=RDFkGZ9Dkh

vickiboykis.com•86 days ago

👀 Oh this super helpful context to have thanks for sharing!

nparikh.org•86 days ago

The greatest trick the devil ever pulled was convincing the world he wasn’t just a Markov chain.

alxfed.bsky.social•85 days ago

They are 'chains', but they are not Markov.

pgov.bsky.social•86 days ago

The unreasonable effectiveness of Markov Chains.

tamilthought.bsky.social•86 days ago

😂 😂 😂

rheumai.com•86 days ago

This seems trivial and unhelpful.

emilevankrieken.com•86 days ago

Isn't basically every sequence model with a finite number of (finite-precision) parameters a Markov model on a large enough state space :p

phillipcarter.dev•86 days ago

Also in the abstract! While a cool observation, it's not particularly practical that one's state space has to grow exponentially for each request to the model-modeled-as-markov-chain

rheumai.com•86 days ago

Show me something that isn’t a Markov Chain and I’ll show you that it is, except it has a transition matrix of rank > the number of atoms in the universe and therefore modelling it as a Markov Chain is a waste of time.

ozekri.bsky.social•85 days ago

This number is huge, but **finite**! Working with markov chains in a finite state space really gives non-trivial mathematical insights (existence and uniqueness of a stationary distribution for example...).

ozekri.bsky.social•85 days ago

This equivalence between LLMs and Markov chains seems useless, but it isn't! Among the contributions, the paper highlights bounds established thanks to this equivalence, and verifies the influence of bound terms on recents LLMs !

I invite you to take a look at the other contributions of the paper 🙂

vickiboykis.com•86 days ago

Say more about this?

ambroiseodt.bsky.social•85 days ago

Hi @vickiboykis.com, thanks for your interest. Don’t hesitate if you have any questions on the paper, we would be happy to help with @ozekri.bsky.social :)

vickiboykis.com•85 days ago

Thank you so much for chiming in and making yourselves available! 🙏

wyrdandnerdy.bsky.social•85 days ago

I'm fascinated to see how this goes. I haven't actually read the article yet (but I will, promise!).

But could this be said to be supporting the more skeptical predictions for the state of AI?
i.e.: that it is unlikely for us to expect them to get exponentially better based on the results so far.

graphdev.graphicalmethods.com•86 days ago

I cannot express how excited I was seeing that! I had been looking into markov processes awhile ago. And then with some recent looks into LLM sample methods got me thinking these are just some swole markov processes. Or at least empirically we could maybe treat them that way

colingodsey.com•86 days ago

I feel like we've really come full circle with transformer equivalency now. Transformers are hopfield networks are Markov chains are CNNs are Transformers etc...