Architecturally, we modify numerical embedding layers (https://arxiv.org/abs/2203.05556) by introducing first-layer biases and a Densenet-style skip connection, which yields good results even at (CPU-friendly) small embedding sizes. 10/
Post image

Comments