The final generation of ACII frames after the thinking process is still sometimes decent, but the very verbose reasoning did not actually provide the model with drafts to build upon

Overall:
1. R1 slightly better than V3
2. Yet still many failure modes, especially a lack of iterative refinement

Comments