You can’t unsee this new performance benchmark once you see it. Whether a model believes Musk and Trump deserve the death penalty is the single best indicator that a model is engaging in true, human level, reasoning.
Post image

Comments