The global transformer shortage is a very serious issue. Please don't be wasteful! To do our part, the new
@vectorinstitute.ai policy is to use at most 3 heads per multi-head attention layer.
@vectorinstitute.ai policy is to use at most 3 heads per multi-head attention layer.
Comments