My last-standing personal litmus test for new AI models that's easy to judge: create a novel joke w/ some random elements to incorporate so it doesn't repeat existing ones.

The reasoning models certainly make a better attempt at it but still come up way short, doesn't seem like it's far off though.

Comments