InternLM3-8B🔥 Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost! huggingface.co/collections/... - ThreadSky | a Reddit-style client for Bluesky

adinayakup.bsky.social • 44 days ago

InternLM3-8B🔥 Trained on just 4T tokens, it outperforms Llama3.1-8B and Qwen2.5-7B in reasoning tasks, at 75% lower cost!
https://huggingface.co/collections/internlm/internlm3-67875827c377690c01a9131d

Comments