The final generation of ACII frames after the thinking process is still sometimes decent, but the very verbose reasoning did not actually provide the model with drafts to build upon
Overall:
1. R1 slightly better than V3
2. Yet still many failure modes, especially a lack of iterative refinement
Overall:
1. R1 slightly better than V3
2. Yet still many failure modes, especially a lack of iterative refinement
Comments