I decided to make the datasets I'm generating for my phd project public. For each protein in Swiss-Prot, I'm making available PLM embeddings (ProtTrans, Ankh, ESM2), GO annotations and taxonomy representations. All files follow the same order, one line per protein.

https://github.com/pentalpha/protein_dimension_db

Comments