Profile avatar
matpagliardini.bsky.social
PhD student in ML at EPFL 🇨🇭working with Martin Jaggi & François Fleuret. Previously Apple MLR (intern). https://mpagli.github.io/
8 posts 224 followers 1,007 following
Getting Started
comment in response to post
Congrats! How important is scale for it to work? In your previous maze work it was clear a recurrent algo could solve the task. The recurrent state could be used as a scratchpad, each iteration decreasing the loss further. Language feels different, with many local minima along the recurrent path.
comment in response to post
Interesting loss curves. I’m not familiar enough with the task to know whether the spikes are expected, but would be curious to see the grad norm.
comment in response to post
Which task?
comment in response to post
Let’s also call on the silent crowd—me included—to start sharing more. Let’s be the change we want to see. You disagree with the political agenda of X? Protest by sharing your latest work/thoughts on Bsky.
comment in response to post
In my quick test on a small (120m) model trained on 14B tokens, the difference in the end is not so significant. Maybe the gap widens when training on less data, closer to chinchilla optimal, or for larger models… I’m team ReLU…
comment in response to post
Let o1 write a review and ask the non-expert human reviewer to verify its claims/refine the review.
comment in response to post
A wise man once told me a paper should not have more than one table. Of course there can be exceptions, but minimizing the number of tables is something I always have in mind when writing. Isolate one or two key messages from the table and convey them with graphs.
comment in response to post
đź‘‹