Claude 3.7 is so temperamental at times. I gave it a Python codebase and asked it to review the test suite (of ~150 pytest tests) that had some OK tests plus some pretty bad ones by Gemini 2.5 Pro;

Comments