Very good (technical) explainer answering "How has DeepSeek improved the Transformer architecture?". Aimed at readers already familiar with Transformers.

https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture

Comments