How many people are using LLMs with suboptimal settings and never realize their true potential? Check your llama.cpp/Ollama default settings!

I've seen 2K max context and 128 max new tokens on too many models that should have much higher values. Especially QwQ needs room to think.
Post image

Comments