How many people are using LLMs with suboptimal settings and never realize their true potential? Check your llama.cpp/Ollama default settings!
I've seen 2K max context and 128 max new tokens on too many models that should have much higher values. Especially QwQ needs room to think.
I've seen 2K max context and 128 max new tokens on too many models that should have much higher values. Especially QwQ needs room to think.
Comments
Overall would that be the best model to use as the smaller model?