It's an open source model so you can check out results at least. Hard to say on inputs if it was able to use only what they said they did but see other comment about Stanford getting similar results. China also probably doesn't mind training on US copyrighted data
Incorrect. They have pretend and hide through obfuscation. They have to make it _seem_ like they didn't use that information. I'm not being facetious or pedantic, it really is harder to do that
No truly intelligent person will tell you the truth, especially if they understand technology, politics, business, or anything else. The future will be cyber wars to position countries, regions, and obtain high-level information. You must study and reason on your own and you will discover the truth.
Not long ago, Stanford trained a similar model called Alpaca on only 52k examples created by GPT. They proved you can use the larger models to train a smaller model with comparable performance, which is likely what they did.
Based on the DeepSeek paper, this is roughly what they did. Their big innovation was having the model reach itself by repeatedly trying to answer the same question and evaluate the responses.
I suspect we'll see more models built this way soon!
apple also has been working on similar things, allowing running models direct off storage for a pretty substantial efficiency increase. LLMs from MS and OpenAI and google simply haven’t been optimized
Here's a good overview of the current AI market which also, towards the end of the video, outlines potential reasons to be skeptical of Deepseek's claims:
The claims that are easily verified can be and the rest is in the process and fairly credible. There’s always a chance at something fishy but it’s not the most likely explanation. This is a field in which people make untrustworthy claims often so evaluation is part of the process.
But perhaps the more important piece: which part are you most concerned with verifying? The performance of the model or the cost and way they describe on how they got it? Most verification has indeed focused on the former so far. The latter is harder to verify.
Yes and while it’s hard to verify concretely right away I think people are assuming this part seems fairly credible in part because:
* It’s a new organization and the budget is about right
* Based in an environment with restrictions which seem likely to spawn stuff like this
You can literally go download the model and compare it to other models. https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero
If DeepSeek was not close to the reports (a la propaganda), the community would be able to immediately refute it.
The stock market did what it did, because, no refutation.
they tell you in their technical reports what they did. the compute figures add up. the performance of the model is indisputable; you can download it and run it yourself. there are _many_ replications in progress; the RL process described works for learning CoT/self-reflection even on small models
here's an r1 replication being done by HuggingFace, for example.
you can be sure most tech companies and lots of individuals are doing this also, the kind of post-training they claim to have done is cheap (& a much bigger deal than the pretrain cost)
it's open source...meaning you can disect its operations to it's foundations...now - of course they built on already pre-existing hardware - but everyone does....it's what ChatGPT was supposed to be....open source....ill caveat that with it might be open source but the CCP still controls the owner!
This 100% How can we reason with an electorate and public that is all ethos as pathos and no logos? We should be very concerned if/when the prices really go down because it will probably signal a recession more than anything else.
Given the ongoing AI competition between the U.S. and China, there may be incentives to exaggerate capabilities or downplay costs for strategic reasons.
I am not an expert but DeepSeek is open source and their distilled models can be downloaded and run locally; I've seen people claiming to have run it on their own Mac Pros
If I can convince you with my techno babble, then I'd also encourage you to dump your entire retirement into a once-in-a-lifetime opportunity called $MANDERCOIN. In fact, just wire me the money directly, it'll be easier.
Perhaps you should pitch a "reality" show to PBS entitled "The Dueling AIs" and ask them all to put up or shut up in public and with a relatively disengaged if skeptical "jury"...
It’s open source. People can look at the code. Also people can install the model locally on a variety of devices and are getting good performance given the hardware limitations. Keeping data local and focused is appealing.
They may have polished some corners of how they got to the end model but if you combine i) the transparency of open source, ii) the intense global interest by other experts and iii) the consequences for a Chinese team (in particular) if debunked...
Partly because you can download the model yourself and play with it. Training cost are somewhat correlated to runtime costs, and people are showing great perf on very small hardware. A decent number of people played with it over the weekend and the results are very good. https://github.com/deepseek-ai/DeepSeek-R1
Comments
https://crfm.stanford.edu/2023/03/13/alpaca.html
I suspect we'll see more models built this way soon!
https://youtu.be/GqcCvvFZsi4?si=pblcYt_Ih4LsnuXD
Here’s where I’m at on why it seems credible so far:
* It’s a new organization and the budget is about right
* Based in an environment with restrictions which seem likely to spawn stuff like this
https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
Deepseek was tested in comparison to other models and equalized/won against them in most tasks.
https://huggingface.co/deepseek-ai/DeepSeek-R1-Zero
If DeepSeek was not close to the reports (a la propaganda), the community would be able to immediately refute it.
The stock market did what it did, because, no refutation.
you can be sure most tech companies and lots of individuals are doing this also, the kind of post-training they claim to have done is cheap (& a much bigger deal than the pretrain cost)
beware, there will be lots of FUD either way
by chance, I can actually answer this for you with what appears to be an objective test:
https://www.nytimes.com/2025/01/23/technology/ai-test-humanitys-last-exam.html?searchResultPosition=1
https://www.theguardian.com/business/2025/jan/27/what-is-deepseek-and-why-did-us-tech-stocks-fall
There's no reason i can come up with to believe it didn't.
The claims are plausible in light of other performance.
Oh wait nvm
Help me better understand the pros and cons of A vs B. When is one or the other best used? What about each system allows for better use?
Deepseek was faster. OpenAI offered more information.
You connect the dots.
Spoiler: it’s a glowingly positive review.
https://www.wired.com/story/deepseek-china-model-ai/
- model is open source
- paper describes their methodology
- limits on gpus
- they charge less
The price they paid for research is the only thing we have to take their word on fwict
https://github.com/deepseek-ai/DeepSeek-R1