Agents are all the rage and we need to track their abilities in the medical domain. Enter MedBrowseComp, the 1st benchmark to assess agents' abilities to reason, navigate the web, and search for verifiable med info!
Preprint: https://arxiv.org/abs/2505.14963
Site: https://moreirap12.github.io/mbc-browse-app/
Preprint: https://arxiv.org/abs/2505.14963
Site: https://moreirap12.github.io/mbc-browse-app/
Comments