Neat! As I was reading @chiphuyen.bsky.social ‘s awesome AI Engineering book I wondered why Autoregressive and Masked Input models weren’t combined more.. seems like nice opportunity

Reposted from Sung Kim

Simply masking 15% of input tokens + next-token prediction (NTP) can significantly boost LLMs on key information retrieval & long-context reasoning—without extra compute!

MEAP (Mask-Enhanced Autoregressive Prediction)

Comments

Posting Rules

Comments

Posting Rules

Reply