Secondly, note that it is the foundation model that captures the statistics. But the new thing is a level of abstraction above that. Turns out the model captures a semantic space in many dimensions where distances in this large space between words can be easily calculated. So, more than statistics.
Comments
I am fascinated by the explainability research Anthropic is doing. If I were a philosopher I would focus on that...
and the inference engine is based on matrix algebra that does a sort of multidimensional optimization search ("attention") over the input prompt+foundation model. (Loosely speaking).
I think it is pretty easy to understand that a model could generate different outputs, based on different generation strategies (https://huggingface.co/docs/transformers/generation_strategies#decoding-strategies)
I'm reminded to suitcase words:
https://alexvermeer.com/unpacking-suitcase-words/