The Deepseek v3 paper is out and the training is very interesting. 1. They use Multi-token prediction during training which Meta released a paper about a few months ago. 2. They used their r1 reasoning models to distill reasoning into v3. github.com/deepseek-ai/... - ThreadSky

maxkannen.bsky.social • 70 days ago

The Deepseek v3 paper is out and the training is very interesting.
1. They use Multi-token prediction during training which Meta released a paper about a few months ago.
2. They used their r1 reasoning models to distill reasoning into v3.
https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf

Comments

Posting Rules

Comments

Posting Rules

Reply