This is new - Moonshot AI (i.e., https://kimi.ai) released the two open-weigh models.
Moonlight: 3B/16B MoE model trained with Muon on 5.7T tokens, advancing the Pareto frontier with better performance at fewer FLOPs.
https://huggingface.co/moonshotai
Moonlight: 3B/16B MoE model trained with Muon on 5.7T tokens, advancing the Pareto frontier with better performance at fewer FLOPs.
https://huggingface.co/moonshotai
Comments