Very interesting read about DeepSeek’s augmented transformer architecture.
https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
https://epoch.ai/gradient-updates/how-has-deepseek-improved-the-transformer-architecture
Comments