Neat! As I was reading @chiphuyen.bsky.social ‘s awesome AI Engineering book I wondered why Autoregressive and Masked Input models weren’t combined more.. seems like nice opportunity
Reposted from Sung Kim
Simply masking 15% of input tokens + next-token prediction (NTP) can significantly boost LLMs on key information retrieval & long-context reasoning—without extra compute!

MEAP (Mask-Enhanced Autoregressive Prediction)

Comments