DeepSeek, a LLM trained for a fraction of the cost of GPT-Xx models, in 2 months for 6 million, on limited GPUs due to export restrictions, and competing head to head. This is crazy. It's not the AI part I'm excited about, it's the level of efficiency. github.com/deepseek-ai/... - ThreadSky

kelseyhightower.com • 78 days ago

DeepSeek, a LLM trained for a fraction of the cost of GPT-Xx models, in 2 months for 6 million, on limited GPUs due to export restrictions, and competing head to head. This is crazy.

It's not the AI part I'm excited about, it's the level of efficiency. https://github.com/deepseek-ai/DeepSeek-V3

Comments

ruujilionuki.bsky.social•78 days ago

📌

realworldtalkx.bsky.social•77 days ago

This may be right up your street, how to get the most out of the current AI's (i.e. know how they work): https://realworldtalk.com/storm/

smartypants69.bsky.social•77 days ago

It's always the lean startup that makes the breakthrough

oneai8.bsky.social•71 days ago

You can talk with #DeepSeek plus 200+ LLMs via OneLLM Pro.

kelseyhightower.com•78 days ago

If these numbers hold up, consider the game changed.

ebcarty.com•78 days ago

Next evolution of AI?

marclachapelle.com•78 days ago

If these numbers hold up, many AI startups will be affected. The icombent like OpenAI and others might have to re-ajust their strategy

verdverm.com•78 days ago

Brand recognition is powerful, in both directions.

If anything, it should make it cheaper to build and buy for everyone, which seems like a good thing.

Models like Microsoft Phi and Google Gemma are indications the big players are already in the small model space.

roland-pashniev.bsky.social•78 days ago

Interesting, the question is it possible to run it on sane number of GPUs, I probably will try
Thanks

verdverm.com•78 days ago

Looks like the full model takes 768GB of VRAM

marclachapelle.com•78 days ago

Who knows, with some luck they won't have to restart 3 Mile Island ...

martinoleary.bsky.social•74 days ago

Will Western companies trust an AI model from China enough to capture these massive savings?

clouddude.bsky.social•78 days ago

Waiting for quantized 13B version 👀

verdverm.com•78 days ago

Wonder if a TPU pod can make it more efficient, with the custom silicon for lower precision. Not sure if the mixed precision operations would carry over

Also wonder if anyone has tried the "tiananmen test" for the available v3 weights

https://old.reddit.com/r/LocalLLaMA/comments/1ctiggk/if_you_ask_deepseekv2_through_the_official_site/

Comments

Posting Rules

Reply