Still, I could workaround the issue and re-run the benchmarks. NL and JSON-Prompt are tied.
But JSON-Schema performed worse than NL in 5 out of 6 tasks in my tests. Plus, in Shuffled Objects, it did so with a huge delta: 97.15% for NL vs. 86.18% for JSON-Schema.
But JSON-Schema performed worse than NL in 5 out of 6 tasks in my tests. Plus, in Shuffled Objects, it did so with a huge delta: 97.15% for NL vs. 86.18% for JSON-Schema.
Comments
Given this result and the key sorting issue, I'd suggest avoiding using JSON-Schema, unless you really need to. JSON-Prompt seems like a better alternative.
For now, there are no clear guidelines on where each method works better.
Your best bet is testing your LLM running your own evals.
and the github code: https://github.com/dylanjcastillo/blog/tree/main/_extras/gemini-structured-outputs