1000x inference cost reduction by converting qwen/llama to rwkv architecture without retraining from scratch.
Big if true/without-any-major-issues. Definitely worth checking.
https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
Big if true/without-any-major-issues. Definitely worth checking.
https://huggingface.co/recursal/QRWKV6-32B-Instruct-Preview-v0.1
Comments