[6/8] We conducted a detailed error analysis by having authors annotate 300 trajectories. Collaborative agents expose significant limitations in current LMs and agent scaffoldings, with communication and situational awareness failures occurring in 65% and 40% of real trajectories.
Comments
Check out our arXiv paper first to learn more: https://arxiv.org/abs/2412.15701
Thank you Vinay, Yucheng, John & @diyiyang.bsky.social for the amazing collaboration!