Profile avatar
dimitrisp.bsky.social
Researcher @MSFTResearch; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily. https://papail.io
171 posts 1,805 followers 293 following
Regular Contributor
Active Commenter
comment in response to post
In the mean time, here's a rule of thumb "if your project can be vibecoded in an hour, and amounts to O(10) LoC edits on something existing, or is a convergence proof that o4-mini can do with a bit of guidance, DO NOT write a paper about it":D
comment in response to post
I think that the current most bullet proof peer review has been "people will read/try your stuff, and if it works they build on it". But because it's not attached to a formal process on openreview we discard it as being non-scientific.
comment in response to post
It seems to me that is totally misaligned with scientific discovery and progress. I don't believe this is a result of bad actors btw. It's just that huge, and complex systems that are O(100) years old take a long time to change, and readjust to new realities. We'll eventually figure it out.
comment in response to post
it seems to me that mostly ML academia (i am part of it!) is a proponent of keeping peer review and mega ML conferences going & the bean counter running. We've not found a solution to reviews converging to random coin tosses, at a huge expense of human work hours.
comment in response to post
If that's indeed the case (i believe we can measure that), and their key function is social, and a way for people to connect (that's great!), what's the point of having peer review, and using # neurips papers as a bean counter?
comment in response to post
my post is a direct criticism to the 100k neurips submissions issue. It's beyond clear that research dissemination--for the most part--does not happen through conferences any more.
comment in response to post
Working on the yapping part :)
comment in response to post
hmm.. temp has to be 0.6-0.8, this looks like very low temp outputs
comment in response to post
I don’t see at all how this is intellectually close to what Shannon wrote. Can you clarify? I read it as computing statistics and how these are compatible with theoretical conjectures. There’s no language generation implicit in the article. Am I misreading it?
comment in response to post
can you share the paper?
comment in response to post
BTW for historical context, 1948, is very very very early to have these thoughts. So i actually think that every single sentence written is profound. This is kinda random, but here is how Greece looked back then. IT WAS SO EARLY :) x.com/DimitrisPapa...
comment in response to post
it's not that profound. it just says, there's no wall, if all stars are aligned. it's an optimistic read of the setting.
comment in response to post
researchers
comment in response to post
Also a sycophant etymologically means "the one who shows the figs"; the origin of the meaning is kinda debated, either refers to illegally importing figs, or to falsely accusing someone of hiding illegally imported figs
comment in response to post
bsky doesn't like GIFs, here they are from the other site x.com/DimitrisPapa...
comment in response to post
Super proud of this work that was led by Nayoung Lee and Jack Cai, with mentorship from Avi Schwarzschild and Kangwook Lee link to our paper: arxiv.org/abs/2502.01612
comment in response to post
Oh btw, self improvement can become exponentially faster in some settings, ory when we apply it on pretrained models (again this is all for add/mul/maze etc)
comment in response to post
An important aspect of the method is that you need to 1) generate problems of appropriate hardness 2) be able to filter our negative examples using a cheap verifier. Otherwise the benefit of self-improvement collapses.
comment in response to post
We test self-improvement across diverse algorithmic tasks: - Arithmetic: Reverse addition, forward (yes forward!) addition, multiplication (with CoT) - String Manipulation: Copying, reversing - Maze Solving: Finding shortest paths in graphs. It always works
comment in response to post
Self-improvement is not new—this idea has been explored in various contexts and domains (like reasoning, mathematics, coding, and more). Our results suggest that self-improvement is a general and scalable solution to length & difficulty generalization!
comment in response to post
What if we leverage this? What if we let the model label slightly harder data… and then train on them? Our key idea is to use Self-Improving Transformers , where a model iteratively labels its own train data and learns from progressively harder examples (inspired by methods like STaR and ReST).
comment in response to post
I was kind of done with length gen, but then I took a closer look at that figure above.. I noticed that there is a bit of transcendence, i.e the model trained on n-digit ADD can solve slightly harder problems, eg n+1, but not much more. (cc on transcendence and chess arxiv.org/html/2406.11741v1)
comment in response to post
Even for simple algorithmic tasks like integer addition, performance collapses as sequence length increases. The only way so far that overcomes this relies on heavily optimizing positional encoding and data format. (Figure from: Cho et al., arxiv.org/abs/2405.20671)
comment in response to post
Standard transformers struggle with length generalization—extrapolating beyond their training distribution. Even GPT-4, o1, and o3 can't multiply long digit numbers. Length generalization using vanilla transformers is a long-standing open problem x.com/yuntiandeng/...
comment in response to post
I think so
comment in response to post
it will still be more expensive than data generation, no?
comment in response to post
lol
comment in response to post
how much would you charge per hour to solve hard math problems at the olympiad level? this is not high school math
comment in response to post
yes I agree with that too
comment in response to post
Agree on this perspective, but deepseek still had a lot of money to spend, many GPUs (2k at least), and a 200 people cracked engineering/scientist team. Still a smaller scale effort than openai, but at a scale that is far above even multi-university groups
comment in response to post
the median was kinda mean
comment in response to post
I am getting professional anxiety. they can solve problems (that I can't solve) 100-1k X cheaper than humans.
comment in response to post
Doubles sounds more impressive than 100% :)