Profile avatar
rdnowak.bsky.social
Director of the Center for the Advancement of Progress
29 posts 716 followers 111 following
Regular Contributor
Active Commenter
comment in response to post
This is collaboration with Ziyue Luo, @shroffness and @kevinlauka
comment in response to post
Jifan’s on the industry job market now, and his expertise in efficient training, distillation, and data curation couldn't be more timely. Feel free to reach out to him at [email protected]. 📄 Paper: arxiv.org/abs/2410.02755
comment in response to post
SIEVE improves upon existing quality filtering methods in the DataComp-LM challenge, producing better LLM pretraining data that led to improved model performance. This work is part of Jifan's broader research on efficient ML training, from active learning to label-efficient SFT for LLMs.
comment in response to post
Why does this matter? High-quality data is the bedrock of LLM training. SIEVE enables filtering trillions of web data for specific domains like medical/legal text with customizable natural language prompts.
comment in response to post
SIEVE distills GPT-4's data filtering capabilities into lightweight models at <1% of the cost. Not just minor improvements - we're talking 500x more efficient filtering operations.
comment in response to post
Maybe Trump should have read my mom's book: "For the first six weeks, the embryo, whether XX or XY, coasts along in sexual ambiguity." p. 25
comment in response to post
Good luck with that
comment in response to post
p.s. we don't know for sure if I said this or not
comment in response to post
Is the solution treating everything electronic as "fake"? Maybe?
comment in response to post
I think the necessary length of the prompt(s) might also heavily depend on the data the model is trained on.
comment in response to post
comment in response to post
comment in response to post
And it must have been addressed to prof Malik :)
comment in response to post
Time for a meta office near the other UW :)
comment in response to post
Since you are interested in discussing I guess it’s worth a read
comment in response to post
Our NeurIPS paper studies differences in solutions from single-task/output neural network training vs. multi-task/output training. openreview.net/pdf?id=APBq3..., arxiv.org/pdf/2410.21696. Poster presentation time and day: Fri 13 Dec ,11am, East Exhibit Hall A-C #2303
comment in response to post
Is it safe to conclude that scaling suffices to make “fire” posters?
comment in response to post
Yes. But I think the word starter, not pack, is the key one. Hopefully these just get the ball rolling.
comment in response to post
I actually found starter packs useful for helping to quickly create communities on this platform. I appreciate people taking the time to get this started!
comment in response to post
New to me, too
comment in response to post
I like this paper too! On a related note, many people competing in The New Yorker cartoon caption contest are using LLMs to boost their creative juices. We recently looked at how LLMs fair in judging and creating humorous captions: arxiv.org/pdf/2406.10522
comment in response to post
Sounds super!