It's official: After more than 57 runs of the MMLU-Pro CS benchmark across 25 LLMs with over 69 hours runtime, QwQ-32B-Preview is THE best local model! I'm still working on the detailed analysis, but here's the main graph that accurately depicts the quality of all tested models. - ThreadSky

wolfram.ravenwolf.ai • 82 days ago

It's official: After more than 57 runs of the MMLU-Pro CS benchmark across 25 LLMs with over 69 hours runtime, QwQ-32B-Preview is THE best local model!

I'm still working on the detailed analysis, but here's the main graph that accurately depicts the quality of all tested models.

Comments

Posting Rules

Comments

Posting Rules

Reply