Neat! As I was reading @chiphuyen.bsky.social ‘s awesome AI Engineering book I wondered why Autoregressive and Masked Input models weren’t combined more.. seems like nice opportunity
Reposted from
Sung Kim
Simply masking 15% of input tokens + next-token prediction (NTP) can significantly boost LLMs on key information retrieval & long-context reasoning—without extra compute!
MEAP (Mask-Enhanced Autoregressive Prediction)
MEAP (Mask-Enhanced Autoregressive Prediction)
Comments