This is new - Moonshot AI (i.e., https://kimi.ai) released the two open-weigh models.

Moonlight: 3B/16B MoE model trained with Muon on 5.7T tokens, advancing the Pareto frontier with better performance at fewer FLOPs.

https://huggingface.co/moonshotai
Post image

Comments