This is a pretty high bar and requires sources that are either bleeding-edge mechanistic interpretability papers or are straight up proprietary secrets, as AI companies don't release their training data sets.

Comments