Some questions triggered by the release of DeepSeek R1 on January 20. These are formulated as questions, because I do not know the answers and it may well be that most of these answers are only things we can find out over time.
Comments
Log in with your Bluesky account to leave a comment
To the best of my understanding, all of the leading companies are following essentially the same playbook (with the small difference that Meta is partially open source).
These companies are unwilling to consider different approaches than foundation models pre-trained as next word predictors on massive data sets, and, for the most part, anything other than diffusion models and chatbots aimed at performing human tasks.
While DeepSeek is not reinventing the wheel and is broadly within the same agenda, it appears to have relied much more heavily on reinforcement learning and mixture-of-experts methods and refined chain-of-thought reasoning very effectively.
As widely reported, it has also done so at a fraction of the cost of the models of leading companies, about $5.5 million, as compared to sums running into hundreds of millions of dollars for the leading models.
I agree that solely using this approach is unlikely to result in “AGI”, whatever that is. We think in images, symbols, all sorts of representations of the world and we use abstract concepts to try to understand it better. These models do appear to have a small amount of emergent understanding though
Comments
US AI investment is massive. Goldman Sachs estimates that the tech sector is set to spend $1 trillion: https://goldmansachs.com/insights/articles/will-the-1-trillion-of-generative-ai-investment-pay-off