This simple pipeline works shockingly well: we substantially outperform (find more interpretable+predictive hypotheses) two recent baselines which use LLMs alone for hypothesis generation (no SAE), and also BERTopic, a classic embedding clustering method. 4/
Comments
I understand your method and approach :), my problem is convincing reviewer's about comparative interpretability methods being better)
e.g. in "Automated Annotation of Disease Subtypes"
https://www.sciencedirect.com/science/article/abs/pii/S1532046424000686
https://github.com/rmovva/HypotheSAEs 7/
You can see every SAE neuron in UMAP space, colored by whether the neuron correlates positively or negatively with the target variable. 8/
Draft: https://arxiv.org/abs/2502.04382
Python package: https://github.com/rmovva/HypotheSAEs
Demo: https://hypothesaes.org
9/9