I'm not deep into the subject matter but the premise of DeepSeek (but unverifiable) is that it was done faster and with less budget than OpenAI and other competitors. That sounds as if they have a few (non open source?) tricks up their sleeves.
Comments
Log in with your Bluesky account to leave a comment
As far as I understand instead of hunan-reinforced learning, they used existing LLMs to validate the training of their own. Definitely sounds like a trick, definitely nothing to do with OSS.
And building on the shoulders of giants. It might be commoditized but there's a lot going on in this advancement that isn't so easily portrayed as Team B humiliates Team A.
DeepSeek's claims are unverified & not indicative of the total system cost. Training costs coming down is debatable. Digital technology has a habit of shifting the goal posts negating efficiency gains. o1/R1 will be quaint in a few months time with new systems requiring more not less resources.
Much the same happened in the cryptocurrency/blockchain world (which the DeepSeek team came from, a lot of the LLM world is cryptoadjacent.) Much was promised, breakthroughs were trumpeted, not much changed from the initial offering.
Also, the DeepSeek guys programmed 20 out of 132 processing units in their memory bandwidth constrained H800 chips to manage cross-chip communications with PTX.
Comments
A truly insane level of optimization