DeepSeek R1’s training method (GRPO) is now fully reproducible—the entire codebase is on GitHub. gist.github.com/willccbb/467... - ThreadSky | a Reddit-style client for Bluesky

akinunver.bsky.social • 24 days ago

DeepSeek R1’s training method (GRPO) is now fully reproducible—the entire codebase is on GitHub. https://gist.github.com/willccbb/4676755236bb08cab5f4e54a0475d6fb

Comments