Using our arithmetic word problem dataset as a new benchmark!

https://huggingface.co/datasets/HelloCephalopod/AWPCD

GPGenAI Turbo: 81.7%
GPT 4 Turbo: 93.9%
o3-mini: 96.4%

You can see the breakdown of the errors in the HF repo #ML #AI #GenAI

Comments