This is a down-and-dirty look at building your own high-performance #AI #LLM inference engine, from raw #CUDA kernels on up. The result? Beating top-shelf libraries at their own game. Still probably best to use a supported library in production, though. - ThreadSky

michaelmoore.ai • 76 days ago

This is a down-and-dirty look at building your own high-performance #AI #LLM inference engine, from raw #CUDA kernels on up. The result? Beating top-shelf libraries at their own game. Still probably best to use a supported library in production, though.

Comments

Posting Rules

Comments

Posting Rules

Reply