DeepSeek, a LLM trained for a fraction of the cost of GPT-Xx models, in 2 months for 6 million, on limited GPUs due to export restrictions, and competing head to head. This is crazy.
It's not the AI part I'm excited about, it's the level of efficiency. https://github.com/deepseek-ai/DeepSeek-V3
It's not the AI part I'm excited about, it's the level of efficiency. https://github.com/deepseek-ai/DeepSeek-V3
Comments
If anything, it should make it cheaper to build and buy for everyone, which seems like a good thing.
Models like Microsoft Phi and Google Gemma are indications the big players are already in the small model space.
Thanks
Also wonder if anyone has tried the "tiananmen test" for the available v3 weights
https://old.reddit.com/r/LocalLLaMA/comments/1ctiggk/if_you_ask_deepseekv2_through_the_official_site/