Any high level but technical explanations of DeepSeek that doesnt talk about nvidia stock out there?
How did they build something seemingly competitive / superior with much fewer means?
How did they build something seemingly competitive / superior with much fewer means?
Comments
- chain of thought models (like o1)
- FP8 natively + clever key points in network with increased accuracy
- multi-token prediction (dont get this one but is not new apparently)
- some original "Multi-head Latent Attention" that saves a lot of VRAM
- Mixture-of-Experts archi..
- something about GPU utilization 🤷♂️
in the section "The Theoretical Threat" in:
https://youtubetranscriptoptimizer.com/blog/05_the_short_case_for_nvda