In any case, for me, the key takeaway is that SO can decrease (or increase!) the performance in some tasks. Be conscious of that.

For now, there are no clear guidelines on where each method works better.

Your best bet is testing your LLM running your own evals.

Comments