dimitrisp.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

dimitrisp.bsky.social

Researcher @MSFTResearch; Prof @UWMadison (on leave); learning in context; thinking about reasoning; babas of Inez Lily. https://papail.io

171 posts 1,804 followers 293 following

Posts 17 Comments 33

What if for most of your findings you just post a thread and share a GitHub repo, rather than submitting a 15 page NeurIPS paper with < 1/100 the reach?

submitted 35 days ago • 3 comments

LLMs learn world models, beyond a reasonable doubt. It's been the case since GPT-3, but now it should be even more clear. Without them "Guess and Check" would not work. The fact that these "world models" are approximate/incomplete does not disqualify them.

submitted 39 days ago • 1 comment

Is 1948 widely acknowledged as the birth of language models and tokenizers? In "A Mathematical Theory of Communication", almost as an afterthought Shannon suggests the N-gram for generating English, and that word level tokenization is better than character level tokenization.

submitted 44 days ago • 2 comments

🎉The Phi-4 reasoning models have landed on HF and Azure AI Foundry. The new models are competitive and often outperform much larger frontier models. It is exciting to see the reasoning capabilities extend to more domains beyond math, including algorithmic reasoning, calendar planning, and coding.

submitted 50 days ago • 1 comment

I am afraid to report, RL works. I think 2-3 years ago, I said I will not work on two ML sub-areas. RL was one of them. I am happy to say that I am not strongly attached to my beliefs.

submitted 51 days ago • 0 comments

Re: The Chatbot Arena Illusion Every eval chokes under hill climbing. If we're lucky, there’s an early phase where real learning (both model and community) can occur. I'd argue that a benchmark’s value lies entirely in that window. So the real question is what did we learn?

submitted 51 days ago • 1 comment

Fun trivia now that “sycophant” became common language to describe LLMs flattering users: In Greek, συκοφάντης (sykophántēs) most typically refers to a malicious slanderer, someone spreading lies, not flattery! Every time you use it, you’re technically using it wrong :D

submitted 53 days ago • 1 comment

Come work with us at MSR AI Frontiers and help us figure out reasoning! We're hiring at the Senior Researcher level (eg post phd). Please drop me a DM if you do! jobs.careers.microsoft.com/us/en/job/17...

submitted 119 days ago • 0 comments

o3 can't multiply beyond a few digits... But I think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement Below is the acc of a tiny model teaching itself how to add and multiply

submitted 127 days ago • 2 comments

o3 can't multiply beyond a few digits... But he think multiplication, addition, maze solving and easy-to-hard generalization is actually solvable on standard transformers... with recursive self-improvement, as presented by @dimitrisp.bsky.social

submitted 127 days ago • 1 comment

Self-improving Transformers can overcome easy-to-hard and length generalization challenges. Paper on arxiv coming on Monday. Link to a talk I gave on this below 👇 Super excited about this work! Talk : youtube.com/watch?v=szhE... slides: tinyurl.com/SelfImprovem...

submitted 138 days ago • 0 comments

Two months before R1 came out, I wrote this in my small notebook of ideas as something to test #schmidhuber

submitted 139 days ago • 1 comment

Now that we have reasoner LLMs, let's think about how to GRPO problem generators that generate instances that sit right outside the frontier of current capabilities.

submitted 141 days ago • 1 comment

🚀 🇬🇷 A year in the making! I’ve just completed a set of 21 lectures in Machine Learning, in Greek, designed for high school students. The course introduces key ML concepts, coding in Python & PyTorch, and real-world AI applications. 👉 WebPage: tinyurl.com/ye2awe8m 🎥 YouTube: tinyurl.com/2wwjru6z

submitted 142 days ago • 2 comments

If you wanted to collect 1 mil reasoning traces from human subjects on say math, that would cost ~$50m, assuming ~50$/person/hour. Interesting to compare with the cost to generate them from a reasoning LLM, with say with cost per trace ~$0.5 (say 10k tokens).. That's 100x cheaper

submitted 143 days ago • 2 comments

Ok we've read a lot about test-time compute being the new scaling axis, but what's the next scaling axis?

submitted 143 days ago • 0 comments

2014 GoogLeNet: The best image classifier was only trainable using weeks of Google's custom infrastructure. 2018 ResNet: A more accurate model is trainable in a 1/2 hour on a single GPU. What stops this from happening for LLMs?

submitted 144 days ago • 3 comments