This is what I've been saying. The power of CoT isn't from it enabling the model to reason. Instead, it is because it gives the model somewhere to dump intermediate tokens/token groups into context, which can then increase the accuracy of following tokens.
Comments
https://www.kaggle.com/datasets/Cornell-University/arxiv