Profile avatar
neuralnoise.com
Researcher in ML/NLP at the University of Edinburgh (faculty at Informatics and EdinburghNLP), Co-Founder/CTO at www.miniml.ai, ELLIS (@ELLIS.eu) Scholar, Generative AI Lab (GAIL, https://gail.ed.ac.uk/) Fellow -- www.neuralnoise.com, he/they
140 posts 5,184 followers 4,594 following
Prolific Poster
Conversation Starter

Hi all, on Thu and Fri (Feb 27th-28th), I'll be in Amsterdam for an Invited Talk at the ELLIS workshop on Representation Learning and Generative Models for Structured Data (sites.google.com/view/rl-and-...) and catch up with my amazing collaborators -- if you're in town, let's meet! 🚀

After 6+ months in the making and over a year of GPU compute, we're excited to release the "Ultra-Scale Playbook": hf.co/spaces/nanot... A book to learn all about 5D parallelism, ZeRO, CUDA kernels, how/why overlap compute & coms with theory, motivation, interactive plots and 4000+ experiments!

Looking for a PhD student to come work with me on the ethical implications of NLP from September! Please share widely and point any interesting students my way! 😊

🧠💭 How the Brain Explores in Sleep A new study suggests that during rest, the brain replays past experiences to optimise exploration in uncertain environments - just like AI planning offline. 🔗 www.nature.com/articles/s41... #SciComm 🧪 #AI #BrainResearch

It's 2025, and I’ve finally updated my Python setup guide to use uv + venv instead of conda + pip! Here's my go-to recommendation for uv + venv in Python projects for faster installs, better dependency management: github.com/rasbt/LLMs-f... (Any additional suggestions?)

From reddit: a letter from a postdoc who survived the Bolsonaro years. This is helpful framing for how to science in this administration. 🧪 www.reddit.com/r/labrats/s/...

Hey that's @marwinsegler.bsky.social 🚀🚀🚀🚀🚀

🔥🔥🔥 @loreloc.bsky.social @ropeharz.bsky.social @cassiodecampos.bsky.social @ac.erikquaeghebeur.name

tiny-gpu A minimal GPU in Verilog optimized for learning about how GPUs work from the ground up. Built with <15 files of fully documented Verilog, complete documentation on architecture & ISA, working matrix addition/multiplication kernels, and full support for kernel simulation & execution traces

I was just told that I have to remove “climate” from the title of an ongoing grant if I want to keep it. And publications from that grant cannot include “climate” and other forbidden words. I can’t believe I’m writing this from the United States of America. #AcademicSky

LLMs can tackle math olympiad problems, but… can they read a clock or calendar? 🕰️📆 Our experiments reveal surprising failures in temporal reasoning—multimodal models struggle with analogue clock reading and date inference! Paper Link: arxiv.org/abs/2502.05092 Dataset: huggingface.co/datasets/roh...

AI, made in Europe: introducing the all new Le Chat by Mistral: your ultimate AI sidekick for life and work. Find out more: mistral.ai/en/news/all-...

A starter pack of people working on interpretability / explainability of all kinds, using theoretical and/or empirical approaches. Reply or DM if you want to be added, and help me reach others! go.bsky.app/DZv6TSS

DBLP (@dblp.org) seems to be down -- here's a mirror: dblp.uni-trier.de

I wrote some reflections on DeepSeek, open-source, AI, US and China, starting from Dario's recent essay calling for stronger export controls. I mostly disagree with his essay and think it missed the point You can read it here: thomwolf.io/blog/deepsee...

In my experience, ChatGPT has been more accurate at diagnosis than most doctors But how could this be? Doesn’t it make stuff up? Yeah, but most doctors literally do that too. They guess with very limited knowledge across specialties ChatGPT tends to make better guesses

Proof on how competition leads to better products for customers, and why it leads to more innovation. And why open AI models are key to competition. Plus a reminder, that OpenAI competes more when it has to - and it becomes more "open" only thanks to competitive pressure:

Our work on fixing and improving MMLU ("Are We Done with MMLU?", arxiv.org/abs/2406.04127; NAACL 2025) is featured on the DeepSeek home page! deepseek.com -- it's always amazing to see real-world applications and the impact of academic research 🚀🚀🚀

The Machine Learning Street Talk (MLST) podcast just dropped an episode with Nicholas Carlini on Spotify! open.spotify.com/episode/6hpi...

Massive congrats @albertomancino.bsky.social!!! 🚀🚀🚀🚀🚀

Only five more days to apply!

apparently RLCoT (chain of thought learned via RL) is in itself an emergent behavior that doesn’t happen until about 1.5B sized models PPO, GPRO, PRIME — doesn’t matter what RL you use, the key is that it’s RL experiment logs: wandb.ai/jiayipan/Tin... x: x.com/jiayi_pirate...

A short guide to run DeepSeek R1 (all 671B of it) on a home cluster of Macs with mlx.distributed. gist.github.com/awni/ec071fd...

Financial Assistance applications are now open! If you face financial barriers to attending ICLR 2025, we encourage you to apply. The program offers prepay and reimbursement options. Applications are due March 2nd with decisions announced March 9th. iclr.cc/Conferences/...

📢 New course alert 📢 I am teaching a course entitled "Language Models and Structured Data" Institut Polytechnique de Paris. Topics are: Language Models (LMs), Prompt Engineering, LoRA, Quantized LMs, RAGs, Graphs, Tabular Data, Text2SQL The slides are available through: shorturl.at/w8iuO

Don't miss your chance to register for ELLIS Unit Amsterdam's Winter School on #AI #FoundationModels. The deadline is Jan 26. More info: ivi.fnwi.uva.nl/ellis/events...

folks, how would you fix this?

The Llama3 models were pre-trained on LibGen: www.theverge.com/2025/1/14/24...

Friendly reminder that we are advertising for a permanent faculty position in Embodied NLP at the School of Informatics, University of Edinburgh (informatics.ed.ac.uk) -- apply by Jan 31st! edin.ac/4fqgawg

In case you were wondering how things are going in Germany & on X, after Elon Musk announced his support for the far-right "Alternative für Deutschland" (AfD) in the upcoming Federal election: The chart below shows sums of tweets x impressions by members of parliament over the past 7 days...🧵⤵️

📕 ⬇️ My thesis on 🚫unargmaxable outputs is online! Check it out if you want to learn more about how output layers constrain what neural networks can and cannot predict 👉 era.ed.ac.uk/handle/1842/...

🦕 Special track at NeSy dedicated to KGs and Ontologies!! Have a look 🙌

Join us on 27 Feb in Amsterdam for the ELLIS workshop on Representation Learning and Generative Models for Structured Data ✨ sites.google.com/view/rl-and-... Inspiring talks by @eisenjulian.bsky.social, @neuralnoise.com, Frank Hutter, Vaishali Pal, TBC. We welcome extended abstracts until 31 Jan!