More experiments and model updates to come. This model is severely under-trained having only seen 32M samples (out of 300M possible) so far.

Reposted from Nathan Paull

Finally got around to completing the first major training runs of my own BERT-like language embedding model. There is a ton of data to pour over as I prepare my next experiment for this weekend, but early results show my model outperforming a Transformer++ BERT model by 1% with fewer params!

Comments

Posting Rules

Comments

Posting Rules

Reply