Using our arithmetic word problem dataset as a new benchmark!
https://huggingface.co/datasets/HelloCephalopod/AWPCD
GPGenAI Turbo: 81.7%
GPT 4 Turbo: 93.9%
o3-mini: 96.4%
You can see the breakdown of the errors in the HF repo #ML #AI #GenAI
https://huggingface.co/datasets/HelloCephalopod/AWPCD
GPGenAI Turbo: 81.7%
GPT 4 Turbo: 93.9%
o3-mini: 96.4%
You can see the breakdown of the errors in the HF repo #ML #AI #GenAI
Comments