https://youtu.be/byy19WPLPBQ?si=WhKEgaLv4_giQdho another interesting one. Suggests even contextual embeddings from BERT are still having the"bag of words" assumption as the attention mechanism doesn't take order into consideration while positional embeddings are fixed and not varying by input.

Comments