This is a down-and-dirty look at building your own high-performance #AI #LLM inference engine, from raw #CUDA kernels on up. The result? Beating top-shelf libraries at their own game. Still probably best to use a supported library in production, though.

Comments