Now cooking the config layer and streaming.

Middleware for streaming might require some trial and error but hey, thats the fun part!

The results might be nice - imagine streaming but line by line instead by random number of tokens - the latency is still low and you can easily act on produced text!
Reposted from Data Geek
My answer to AiSuite - focused even more on the GPU-poor.
It's still in early development, but you can take a look here:
github.com/mobarski/ai-...

All feedback is more than welcome (comments, likes, github stars, reposts).

Comments