- bing executes some scripts, but idk when
- yandex doesn't do js on first hit, but may in subsequent hits. google is similar, but iiuc not consistent
- found nothing helpful on ddg or brave
- baidu ignores scripts entirely
note that instructions for googlebot are honored by some other crawlers (notably brave's), and crawlers will commonly pretend via their user agent string to be a normal browser or googlebot.
honestly, with the various hacks needed for wrangling search, a standardized "i'm a web crawler" header would be useful, preferably paired with a way to tell apart use cases (search, ai training, language research, etc.) and with a legal mandate to use it as applicable (and honor sites' robots.txt).
Comments
- bing executes some scripts, but idk when
- yandex doesn't do js on first hit, but may in subsequent hits. google is similar, but iiuc not consistent
- found nothing helpful on ddg or brave
- baidu ignores scripts entirely