wordscompute.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

weve reached that point in this submission cycle, no amount of coffee will do 😞🙂‍↔️😞

submitted 48 days ago • 0 comments

INCOMING

submitted 90 days ago • 0 comments

titled: peer review

submitted 90 days ago • 0 comments

Life update: I'm starting as faculty at Boston University @bucds.bsky.social in 2026! BU has SCHEMES for LM interpretability & analysis, I couldn't be more pumped to join a burgeoning supergroup w/ @najoung.bsky.social @amuuueller.bsky.social. Looking for my first students, so apply and reach out!

submitted 92 days ago • 35 comments

really excited to be headed to OFC in SF! so excited to revisit optical physics 😀

submitted 104 days ago • 2 comments

Transformers employ different strategies through training to minimize loss, but how do these tradeoff and why? Excited to share our newest work, where we show remarkably rich competitive and cooperative interactions (termed "coopetition") as a transformer learns. Read on 🔎⏬

submitted 108 days ago • 1 comment

New paper–accepted as spotlight at #ICLR2025! 🧵👇 We show a competition dynamic between several algorithms splits a toy model’s ICL abilities into four broad phases of train/test settings! This means ICL is akin to a mixture of different algorithms, not a monolithic ability.

submitted 131 days ago • 2 comments

Starlings move in undulating curtains across the sky. Forests of bamboo blossom at once. But some individuals don’t participate in these mystifying synchronized behaviors — and scientists are learning that they may be as important as those that do.

submitted 132 days ago • 2 comments

New piece out! We explain why Fully Autonomous Agents Should Not be Developed, breaking “AI Agent” down into its components & examining through ethical values. With @evijit.io, @giadapistilli.com and @sashamtl.bsky.social huggingface.co/papers/2502....

submitted 141 days ago • 5 comments

Brian Hie harnessed the powerful parallels between DNA and human language to create an AI tool that interprets genomes. Read his conversation with Ingrid Wickelgren: www.quantamagazine.org/the-poetry-f...

submitted 142 days ago • 1 comment

Genomic Foundationless Models: Pretraining Does Not Promise Performance I've long believed genomic foundation models are not as useful as claimed. In my mind, there isn't enough training data to justify their size. Interesting to see more work in this direction. www.biorxiv.org/content/10.1...

submitted 145 days ago • 3 comments

How do tokens evolve as they are processed by a deep Transformer? With José A. Carrillo, @gabrielpeyre.bsky.social and @pierreablin.bsky.social, we tackle this in our new preprint: A Unified Perspective on the Dynamics of Deep Transformers arxiv.org/abs/2501.18322 ML and PDE lovers, check it out!

submitted 147 days ago • 2 comments

it’s finally raining in la:)

submitted 152 days ago • 0 comments

i go on a really long walk almost every day, and at a high point in silverlake, i saw fire from all sides. and it's harder to breathe. and everything is orange.

submitted 169 days ago • 1 comment

New paper <3 Interested in inference-time scaling? In-context Learning? Mech Interp? LMs can solve novel in-context tasks, with sufficient examples (longer contexts). Why? Bc they dynamically form in-context representations! 1/N

submitted 173 days ago • 3 comments

Hollywood High School will serve as an evacuation site for the Sunset fire in Hollywood, KTLA reported. The school is at 1521 Highland Ave. www.latimes.com/california/s...

submitted 169 days ago • 24 comments

hello bluesky! we have a new preprint on solvation free energies: tl;dr: We define an interpolating density by its sampling process, and learn the corresponding equilibrium potential with score matching. arxiv.org/abs/2410.15815 with @francois.fleuret.org and @tbereau.bsky.social (1/n)

submitted 192 days ago • 1 comment

The slides of my NeurIPS lecture "From Diffusion Models to Schrödinger Bridges - Generative Modeling meets Optimal Transport" can be found here drive.google.com/file/d/1eLa3...

submitted 194 days ago • 9 comments

An Evolved Universal Transformer Memory sakana.ai/namm/ Introducing Neural Attention Memory Models (NAMM), a new kind of neural memory system for Transformers that not only boost their performance and efficiency but are also transferable to other foundation models without any additional training!

submitted 199 days ago • 1 comment

Tomorrow (Dec 12) poster #2311! Go talk to @emalach.bsky.social and the other authors at #NeurIPS, say hi from me!

submitted 198 days ago • 0 comments

Sometimes our anthropocentric assumptions about how intelligence "should" work (like using language for reasoning) may be holding AI back. Letting AI reason in its own native "language" in latent space could unlock new capabilities, improving reasoning over Chain of Thought. arxiv.org/pdf/2412.06769

submitted 199 days ago • 5 comments

What counts as in-context learning (ICL)? Typically, you might think of it as learning a task from a few examples. However, we’ve just written a perspective (arxiv.org/abs/2412.03782) suggesting interpreting a much broader spectrum of behaviors as ICL! Quick summary thread: 1/7

submitted 199 days ago • 2 comments

Can language models transcend the limitations of training data? We train LMs on a formal grammar, then prompt them OUTSIDE of this grammar. We find that LMs often extrapolate logical rules and apply them OOD, too. Proof of a useful inductive bias. Check it out at NeurIPS: nips.cc/virtual/2024...

submitted 203 days ago • 7 comments

The BlueSky team created a great tool to help newcomers: starter packs. Here a quick starter pack for #complexity and network scientists + feeds to quickly join the community! #NetSky Please, help to share (and if you are not in the list, get in touch and I will add you) go.bsky.app/KMfiTU2

submitted 227 days ago • 73 comments

go.bsky.app/Gf4uKHG Let me know if you want to be added.

submitted 205 days ago • 2 comments

neural population models are so cool/wish i knew more about them ✨😲

submitted 206 days ago • 1 comment

debating which is scarier, overleaf being down since 4am vs emergency martial law in korea

submitted 206 days ago • 0 comments

Is generalisation a process, an operation, or a product? 🤨 Read about the different ways generalisation is defined, parallels between humans & machines, methods & evaluation in our new paper: arxiv.org/abs/2411.15626 co-authored with many smart minds as a product of Dagstuhl 🙏🎉

submitted 212 days ago • 0 comments

😮

submitted 214 days ago • 7 comments

friends i made a Moomins starter pack. that is, for those of us who are moomins. i don't know if this is good

submitted 215 days ago • 8 comments

How do LLMs learn to reason from data? Are they ~retrieving the answers from parametric knowledge🦜? In our new preprint, we look at the pretraining data and find evidence against this: Procedural knowledge in pretraining drives LLM reasoning ⚙️🔢 🧵⬇️

submitted 219 days ago • 38 comments

The latest from our interpretability team: there is an ambiguity in prior work on the linear representation hypothesis: Is a linear representation a linear function (that preserves the origin) or an affine function (that does not)? This distinction matters in practice. arxiv.org/abs/2411.09003

submitted 217 days ago • 4 comments