How many people are using LLMs with suboptimal settings and never realize their true potential? Check your llama.cpp/Ollama default settings! I've seen 2K max context and 128 max new tokens on too many models that should have much higher values. Especially QwQ needs room to think. - ThreadSky

About ThreadSky

wolfram.ravenwolf.ai • 80 days ago

How many people are using LLMs with suboptimal settings and never realize their true potential? Check your llama.cpp/Ollama default settings!

I've seen 2K max context and 128 max new tokens on too many models that should have much higher values. Especially QwQ needs room to think.

Comments

ubernicholi.bsky.social•74 days ago

That would explain a few things. It makes sense that 'thinking' tokens count as generated tokens. Add rewrite script to the TODO list.

asmeurer.com•70 days ago

What are the optimal settings for qwq?

wolfram.ravenwolf.ai•70 days ago

QwQ 32B features a 32K context window - a crucial element that enables its sophisticated long-form reasoning capabilities. For optimal performance, set the context window to at least 8K tokens, better 16K or ideally the full 32K, with your choice depending on your available computational resources.

olickel.com•74 days ago

I noticed from the benchmark you're using qwen-coder 0.5B for spec decoding - which is super cool

Overall would that be the best model to use as the smaller model?

wolfram.ravenwolf.ai•74 days ago

I haven't seen anything better yet. It's so small and fast that when it predicts correctly, it provides a huge speed boost, and even when predictions are off, it doesn't slow things down too much.

olickel.com•70 days ago

Makes sense!

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply