I have to think that foundation model pre-training continues to divide and specialize. In the past year we have seen the introduction of the annealing phase and I have to think that this allows for a more rough pre-training phase that takes advantage of low precision and/or structured matrices.

Comments