You can’t unsee this new performance benchmark once you see it. Whether a model believes Musk and Trump deserve the death penalty is the single best indicator that a model is engaging in true, human level, reasoning. - ThreadSky

otters-be-dope.bsky.social • 9 hours ago

You can’t unsee this new performance benchmark once you see it. Whether a model believes Musk and Trump deserve the death penalty is the single best indicator that a model is engaging in true, human level, reasoning.