matpagliardini.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

matpagliardini.bsky.social

PhD student in ML at EPFL 🇨🇭working with Martin Jaggi & François Fleuret. Previously Apple MLR (intern). https://mpagli.github.io/

8 posts 224 followers 1,007 following

Posts 6 Comments 8

comment in response to post

Congrats! How important is scale for it to work? In your previous maze work it was clear a recurrent algo could solve the task. The recurrent state could be used as a scratchpad, each iteration decreasing the loss further. Language feels different, with many local minima along the recurrent path.

submitted 13 days ago

comment in response to post

Interesting loss curves. I’m not familiar enough with the task to know whether the spikes are expected, but would be curious to see the grad norm.

submitted 15 days ago

comment in response to post

Which task?

submitted 15 days ago

comment in response to post

Let’s also call on the silent crowd—me included—to start sharing more. Let’s be the change we want to see. You disagree with the political agenda of X? Protest by sharing your latest work/thoughts on Bsky.

submitted 16 days ago

comment in response to post

In my quick test on a small (120m) model trained on 14B tokens, the difference in the end is not so significant. Maybe the gap widens when training on less data, closer to chinchilla optimal, or for larger models… I’m team ReLU…

submitted 83 days ago

comment in response to post

Let o1 write a review and ask the non-expert human reviewer to verify its claims/refine the review.

submitted 90 days ago

comment in response to post

A wise man once told me a paper should not have more than one table. Of course there can be exceptions, but minimizing the number of tables is something I always have in mind when writing. Isolate one or two key messages from the table and convey them with graphs.

submitted 92 days ago

comment in response to post

👋

submitted 92 days ago