1000x inference cost reduction by converting qwen/llama to rwkv architecture without retraining from scratch.

Big if true/without-any-major-issues. Definitely worth checking.

https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
Post image

Comments