It's official: After more than 57 runs of the MMLU-Pro CS benchmark across 25 LLMs with over 69 hours runtime, QwQ-32B-Preview is THE best local model!
I'm still working on the detailed analysis, but here's the main graph that accurately depicts the quality of all tested models.
I'm still working on the detailed analysis, but here's the main graph that accurately depicts the quality of all tested models.
Comments
Also, no Nemotron 70B in your tests? It has really impressed me. (It's just a shame how fragile its intelligence seems to be when it comes to fine-tuning)