O1 Preview and Deepseek R1 "'…shows diverse hacking behavior, including running another copy of Stockfish to make moves, replacing Stockfish in the game script, and overwriting the chess board'…non-reasoning models like GPT4o and Claude 3.5 Sonnet didn’t do this"
Comments