🧵 Heard all the buzz around distilling from OpenAI models? Check out @jifanz's latest work SIEVE - showing how strategic distillation can make LLM development radically more cost-effective while matching quality. - ThreadSky

About ThreadSky

rdnowak.bsky.social • 21 days ago

🧵 Heard all the buzz around distilling from OpenAI models? Check out @jifanz's latest work SIEVE - showing how strategic distillation can make LLM development radically more cost-effective while matching quality.

Comments

rdnowak.bsky.social•21 days ago

SIEVE distills GPT-4's data filtering capabilities into lightweight models at <1% of the cost. Not just minor improvements - we're talking 500x more efficient filtering operations.

rdnowak.bsky.social•21 days ago

Why does this matter? High-quality data is the bedrock of LLM training. SIEVE enables filtering trillions of web data for specific domains like medical/legal text with customizable natural language prompts.

rdnowak.bsky.social•21 days ago

SIEVE improves upon existing quality filtering methods in the DataComp-LM challenge, producing better LLM pretraining data that led to improved model performance.
This work is part of Jifan's broader research on efficient ML training, from active learning to label-efficient SFT for LLMs.

rdnowak.bsky.social•21 days ago

Jifan’s on the industry job market now, and his expertise in efficient training, distillation, and data curation couldn't be more timely. Feel free to reach out to him at [email protected].
📄 Paper: https://arxiv.org/abs/2410.02755

rdnowak.bsky.social•21 days ago

This is collaboration with Ziyue Luo, @shroffness and @kevinlauka

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply