Using our arithmetic word problem dataset as a new benchmark! https://huggingface.co/datasets/HelloCephalopod/AWPCD GPGenAI Turbo: 81.7% GPT 4 Turbo: 93.9% o3-mini: 96.4% You can see the breakdown of the errors in the HF repo #ML #AI #GenAI - ThreadSky

wattmaller.bsky.social • 14 days ago

Using our arithmetic word problem dataset as a new benchmark!

https://huggingface.co/datasets/HelloCephalopod/AWPCD

GPGenAI Turbo: 81.7%
GPT 4 Turbo: 93.9%
o3-mini: 96.4%

You can see the breakdown of the errors in the HF repo #ML #AI #GenAI

Comments

Posting Rules

Comments

Posting Rules

Reply