good questions! from what I see some folks still use complex mappers like Perceivers, but often simple mlp works good enough. the variable which induces the biggest improvement is almost always the alignment data.

Comments