The next paper I saw was on continuous chain of thought, creating new latent thoughts that are much more expressive and allow the model to compress its thinking by an OOM.
https://arxiv.org/abs/2412.06769
https://arxiv.org/abs/2412.06769
Comments
https://arxiv.org/abs/2412.08821
https://ai.meta.com/research/publications/byte-latent-transformer-patches-scale-better-than-tokens/