Two days ago, Deepseek surprised everyone with an "undefined-behavior" PTX optimization speeding up particular ML workloads on a Hopper NVIDIA GPU Kernel. Let's reverse engineer the hack, implement it ourselves, and benchmark the speedup on an H100. - ThreadSky

lauriewired.bsky.social • 4 days ago

Two days ago, Deepseek surprised everyone with an "undefined-behavior" PTX optimization speeding up particular ML workloads on a Hopper NVIDIA GPU Kernel.

Let's reverse engineer the hack, implement it ourselves, and benchmark the speedup on an H100.

Comments

Posting Rules

Comments

Posting Rules

Reply