Still, I could workaround the issue and re-run the benchmarks. NL and JSON-Prompt are tied.

But JSON-Schema performed worse than NL in 5 out of 6 tasks in my tests. Plus, in Shuffled Objects, it did so with a huge delta: 97.15% for NL vs. 86.18% for JSON-Schema.
Post image

Comments