Gemini has 3 ways of generating SO:
1. Forced function calling (FC): https://ai.google.dev/gemini-api/tutorials/extract_structured_data
2. Schema in prompt (SO-Prompt): https://ai.google.dev/gemini-api/docs/structured-output?lang=python#supply-schema-in-prompt
3. Schema in model config (SO-Schema): https://ai.google.dev/gemini-api/docs/structured-output?lang=python#supply-schema-in-config
SO-Prompt works well. But FC and SO-Schema have a major flaw.
1. Forced function calling (FC): https://ai.google.dev/gemini-api/tutorials/extract_structured_data
2. Schema in prompt (SO-Prompt): https://ai.google.dev/gemini-api/docs/structured-output?lang=python#supply-schema-in-prompt
3. Schema in model config (SO-Schema): https://ai.google.dev/gemini-api/docs/structured-output?lang=python#supply-schema-in-config
SO-Prompt works well. But FC and SO-Schema have a major flaw.
Comments
You can fix SO-Schema by being smart with keys. Instead of "reasoning" and "answer" use something like "reasoning and "solution".
But it doesn't work in the Generative AI SDK. Other users have already reported this issue.
For the benchmarks, I excluded FC and used already sorted keys for JSON-Schema.
But JSON-Schema performed worse than NL in 5 out of 6 tasks in my tests. Plus, in Shuffled Objects, it did so with a huge delta: 97.15% for NL vs. 86.18% for JSON-Schema.
Given this result and the key sorting issue, I'd suggest avoiding using JSON-Schema, unless you really need to. JSON-Prompt seems like a better alternative.
For now, there are no clear guidelines on where each method works better.
Your best bet is testing your LLM running your own evals.
and the github code: https://github.com/dylanjcastillo/blog/tree/main/_extras/gemini-structured-outputs