in terms of papers for which both (1) I've tried to implement, and (2) there's source code available, i'm currently at 4 of 4 of them having source code not matching the paper description.
Comments
Log in with your Bluesky account to leave a comment
for the latest paper, it's a bunch of relatively minor stuff, like "actually the activation function is not applied to the input layer", and "actually it's not 2 hidden layers of size 64, but one hidden of size 128", and "actually it's not gaussian Xavier weight initialization, but uniform Kaiming"
still, if you try to implement the pseudocode in the paper, it won't work. that first discrepancy is a pretty big problem in this case. I dream of a world where prose descriptions of algorithms are automatically generated from source code, though it's unclear to me how feasible that is at present
Comments