dimitrisp.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

In the mean time, here's a rule of thumb "if your project can be vibecoded in an hour, and amounts to O(10) LoC edits on something existing, or is a convergence proof that o4-mini can do with a bit of guidance, DO NOT write a paper about it":D

submitted 36 days ago

comment in response to post

I think that the current most bullet proof peer review has been "people will read/try your stuff, and if it works they build on it". But because it's not attached to a formal process on openreview we discard it as being non-scientific.

submitted 36 days ago

comment in response to post

It seems to me that is totally misaligned with scientific discovery and progress. I don't believe this is a result of bad actors btw. It's just that huge, and complex systems that are O(100) years old take a long time to change, and readjust to new realities. We'll eventually figure it out.

submitted 36 days ago

comment in response to post

it seems to me that mostly ML academia (i am part of it!) is a proponent of keeping peer review and mega ML conferences going & the bean counter running. We've not found a solution to reviews converging to random coin tosses, at a huge expense of human work hours.

submitted 36 days ago

comment in response to post

If that's indeed the case (i believe we can measure that), and their key function is social, and a way for people to connect (that's great!), what's the point of having peer review, and using # neurips papers as a bean counter?

submitted 36 days ago

comment in response to post

my post is a direct criticism to the 100k neurips submissions issue. It's beyond clear that research dissemination--for the most part--does not happen through conferences any more.

submitted 36 days ago

comment in response to post

Working on the yapping part :)

submitted 44 days ago

comment in response to post

hmm.. temp has to be 0.6-0.8, this looks like very low temp outputs

submitted 44 days ago

comment in response to post

I don’t see at all how this is intellectually close to what Shannon wrote. Can you clarify? I read it as computing statistics and how these are compatible with theoretical conjectures. There’s no language generation implicit in the article. Am I misreading it?

submitted 44 days ago

comment in response to post

can you share the paper?

submitted 44 days ago

comment in response to post

BTW for historical context, 1948, is very very very early to have these thoughts. So i actually think that every single sentence written is profound. This is kinda random, but here is how Greece looked back then. IT WAS SO EARLY :) x.com/DimitrisPapa...

submitted 45 days ago

comment in response to post

it's not that profound. it just says, there's no wall, if all stars are aligned. it's an optimistic read of the setting.

submitted 45 days ago

comment in response to post

researchers

submitted 52 days ago

comment in response to post

Also a sycophant etymologically means "the one who shows the figs"; the origin of the meaning is kinda debated, either refers to illegally importing figs, or to falsely accusing someone of hiding illegally imported figs

submitted 54 days ago

comment in response to post

bsky doesn't like GIFs, here they are from the other site x.com/DimitrisPapa...

submitted 128 days ago

comment in response to post

Super proud of this work that was led by Nayoung Lee and Jack Cai, with mentorship from Avi Schwarzschild and Kangwook Lee link to our paper: arxiv.org/abs/2502.01612

submitted 128 days ago

comment in response to post

Oh btw, self improvement can become exponentially faster in some settings, ory when we apply it on pretrained models (again this is all for add/mul/maze etc)

submitted 128 days ago

comment in response to post

An important aspect of the method is that you need to 1) generate problems of appropriate hardness 2) be able to filter our negative examples using a cheap verifier. Otherwise the benefit of self-improvement collapses.

submitted 128 days ago

comment in response to post

We test self-improvement across diverse algorithmic tasks: - Arithmetic: Reverse addition, forward (yes forward!) addition, multiplication (with CoT) - String Manipulation: Copying, reversing - Maze Solving: Finding shortest paths in graphs. It always works

submitted 128 days ago

comment in response to post

Self-improvement is not new—this idea has been explored in various contexts and domains (like reasoning, mathematics, coding, and more). Our results suggest that self-improvement is a general and scalable solution to length & difficulty generalization!

submitted 128 days ago

comment in response to post

What if we leverage this? What if we let the model label slightly harder data… and then train on them? Our key idea is to use Self-Improving Transformers , where a model iteratively labels its own train data and learns from progressively harder examples (inspired by methods like STaR and ReST).

submitted 128 days ago

comment in response to post

I was kind of done with length gen, but then I took a closer look at that figure above.. I noticed that there is a bit of transcendence, i.e the model trained on n-digit ADD can solve slightly harder problems, eg n+1, but not much more. (cc on transcendence and chess arxiv.org/html/2406.11741v1)

submitted 128 days ago

comment in response to post

Even for simple algorithmic tasks like integer addition, performance collapses as sequence length increases. The only way so far that overcomes this relies on heavily optimizing positional encoding and data format. (Figure from: Cho et al., arxiv.org/abs/2405.20671)

submitted 128 days ago

comment in response to post

Standard transformers struggle with length generalization—extrapolating beyond their training distribution. Even GPT-4, o1, and o3 can't multiply long digit numbers. Length generalization using vanilla transformers is a long-standing open problem x.com/yuntiandeng/...

submitted 128 days ago

comment in response to post

I think so

submitted 142 days ago

comment in response to post

it will still be more expensive than data generation, no?

submitted 143 days ago

comment in response to post

lol

submitted 143 days ago

comment in response to post

how much would you charge per hour to solve hard math problems at the olympiad level? this is not high school math

submitted 143 days ago

comment in response to post

yes I agree with that too

submitted 144 days ago

comment in response to post

Agree on this perspective, but deepseek still had a lot of money to spend, many GPUs (2k at least), and a 200 people cracked engineering/scientist team. Still a smaller scale effort than openai, but at a scale that is far above even multi-university groups

submitted 144 days ago

comment in response to post

the median was kinda mean

submitted 144 days ago

comment in response to post

I am getting professional anxiety. they can solve problems (that I can't solve) 100-1k X cheaper than humans.

submitted 144 days ago

comment in response to post

Doubles sounds more impressive than 100% :)

submitted 145 days ago