proteinator.bsky.social
PhD candidate in bioinformatics
Protein structure prediction / Protein design
17 posts
23 followers
55 following
Getting Started
Conversation Starter
comment in response to
post
๐คก๐คก๐คก
comment in response to
post
๐๐๐
comment in response to
post
Ah yeah, didnโt fit into the previous post: thereโs no bias in sec structure distro in the PDB. Also in the paper I attached above we have a plot showing the mean of PDB regards to hels and strands and of different gen models ๐
comment in response to
post
Gen models just learn the locality of the helices very quickly and โcheatโ on the metrics by overrepresenting helices. Especially for bigger proteins > 500 aas it really becomes apparent..
comment in response to
post
Not sure itโs that easy. We recently proposed a checkpointing selection criterion (arxiv.org/abs/2411.05238) to match better the distribution of secondary structure elements of native proteins from PDB but it doesnโt seem to work too good.
comment in response to
post
the consequence of that is pretty well known - rock-stable rigid proteins ๐ค
comment in response to
post
Keep me in the loop, I'd be interested in seeing it! We just need someone to run this multiple times xD However, my guess would be that strands will approach 0% on avg in this setup.
comment in response to
post
Typically gen models for protein structure are mode collapsed towards alpha-helices, AF3 won't be an exception here either if used in such a way. The reason why it hallucinates helices is just simply they're easy to learn as an optimal minimization of the diffusion loss fct during training
comment in response to
post
That's an interesting assumption though I don't think this will work for something bigger than let's say 150-200aa. And clearly it will hallucinate helical bundles just arranged in slightly (maybe not) different topologies. Not sure it's of any meaning ๐
comment in response to
post
I'm not sure I'm following. How is it useful if seq -> str mapping is not guaranteed anymore? This puts the equality sign between random seq and pMPNN generated seq
comment in response to
post
yeah, doesn't change the thing. There should be just an alternative, non-static encoding of protein structures. Maybe as a multivariate energy (not in the physical sense) landscape of protein conformations ๐
comment in response to
post
Well, it clearly shows the over-reliance on conformations of crystal structures. I guess training should include some probability distros of protein structures accounting for dynamics, though it's obviously a non-trivial problem to solve..
comment in response to
post
Looks like recursion sneaked in ๐
comment in response to
post
what is this ๐
...
comment in response to
post
At least it's encouraging to see that the old but gold SE(3) eq architecture outperforms all atom diffusion models in the low RNA structure data regime ๐
comment in response to
post
As promised on twitter - only horses ๐