Yes, agreed, if you have MLPs between each layer of self-attention it may be superfluous...

Comments