neuralnoise.com - Profile | ThreadSky | a Reddit-style client for Bluesky

Hi all, on Thu and Fri (Feb 27th-28th), I'll be in Amsterdam for an Invited Talk at the ELLIS workshop on Representation Learning and Generative Models for Structured Data (sites.google.com/view/rl-and-...) and catch up with my amazing collaborators -- if you're in town, let's meet! 🚀

submitted 1 day ago • 1 comment

After 6+ months in the making and over a year of GPU compute, we're excited to release the "Ultra-Scale Playbook": hf.co/spaces/nanot... A book to learn all about 5D parallelism, ZeRO, CUDA kernels, how/why overlap compute & coms with theory, motivation, interactive plots and 4000+ experiments!

submitted 8 days ago • 3 comments

Looking for a PhD student to come work with me on the ethical implications of NLP from September! Please share widely and point any interesting students my way! 😊

submitted 16 days ago • 0 comments

🧠💭 How the Brain Explores in Sleep A new study suggests that during rest, the brain replays past experiences to optimise exploration in uncertain environments - just like AI planning offline. 🔗 www.nature.com/articles/s41... #SciComm 🧪 #AI #BrainResearch

submitted 11 days ago • 1 comment

It's 2025, and I’ve finally updated my Python setup guide to use uv + venv instead of conda + pip! Here's my go-to recommendation for uv + venv in Python projects for faster installs, better dependency management: github.com/rasbt/LLMs-f... (Any additional suggestions?)

submitted 12 days ago • 11 comments

From reddit: a letter from a postdoc who survived the Bolsonaro years. This is helpful framing for how to science in this administration. 🧪 www.reddit.com/r/labrats/s/...

submitted 15 days ago • 9 comments

Hey that's @marwinsegler.bsky.social 🚀🚀🚀🚀🚀

submitted 15 days ago • 0 comments

🔥🔥🔥 @loreloc.bsky.social @ropeharz.bsky.social @cassiodecampos.bsky.social @ac.erikquaeghebeur.name

submitted 15 days ago • 1 comment

tiny-gpu A minimal GPU in Verilog optimized for learning about how GPUs work from the ground up. Built with <15 files of fully documented Verilog, complete documentation on architecture & ISA, working matrix addition/multiplication kernels, and full support for kernel simulation & execution traces

submitted 16 days ago • 2 comments

I was just told that I have to remove “climate” from the title of an ongoing grant if I want to keep it. And publications from that grant cannot include “climate” and other forbidden words. I can’t believe I’m writing this from the United States of America. #AcademicSky

submitted 16 days ago • 402 comments

LLMs can tackle math olympiad problems, but… can they read a clock or calendar? 🕰️📆 Our experiments reveal surprising failures in temporal reasoning—multimodal models struggle with analogue clock reading and date inference! Paper Link: arxiv.org/abs/2502.05092 Dataset: huggingface.co/datasets/roh...

submitted 17 days ago • 0 comments

AI, made in Europe: introducing the all new Le Chat by Mistral: your ultimate AI sidekick for life and work. Find out more: mistral.ai/en/news/all-...

submitted 20 days ago • 0 comments

A starter pack of people working on interpretability / explainability of all kinds, using theoretical and/or empirical approaches. Reply or DM if you want to be added, and help me reach others! go.bsky.app/DZv6TSS

submitted 105 days ago • 35 comments

DBLP (@dblp.org) seems to be down -- here's a mirror: dblp.uni-trier.de

submitted 25 days ago • 0 comments

I wrote some reflections on DeepSeek, open-source, AI, US and China, starting from Dario's recent essay calling for stronger export controls. I mostly disagree with his essay and think it missed the point You can read it here: thomwolf.io/blog/deepsee...

submitted 26 days ago • 2 comments

In my experience, ChatGPT has been more accurate at diagnosis than most doctors But how could this be? Doesn’t it make stuff up? Yeah, but most doctors literally do that too. They guess with very limited knowledge across specialties ChatGPT tends to make better guesses

submitted 26 days ago • 15 comments

Proof on how competition leads to better products for customers, and why it leads to more innovation. And why open AI models are key to competition. Plus a reminder, that OpenAI competes more when it has to - and it becomes more "open" only thanks to competitive pressure:

submitted 26 days ago • 6 comments

Our work on fixing and improving MMLU ("Are We Done with MMLU?", arxiv.org/abs/2406.04127; NAACL 2025) is featured on the DeepSeek home page! deepseek.com -- it's always amazing to see real-world applications and the impact of academic research 🚀🚀🚀

submitted 30 days ago • 0 comments

The Machine Learning Street Talk (MLST) podcast just dropped an episode with Nicholas Carlini on Spotify! open.spotify.com/episode/6hpi...

submitted 32 days ago • 0 comments

Massive congrats @albertomancino.bsky.social!!! 🚀🚀🚀🚀🚀

submitted 32 days ago • 0 comments

Only five more days to apply!

submitted 32 days ago • 0 comments

apparently RLCoT (chain of thought learned via RL) is in itself an emergent behavior that doesn’t happen until about 1.5B sized models PPO, GPRO, PRIME — doesn’t matter what RL you use, the key is that it’s RL experiment logs: wandb.ai/jiayipan/Tin... x: x.com/jiayi_pirate...

submitted 33 days ago • 2 comments

A short guide to run DeepSeek R1 (all 671B of it) on a home cluster of Macs with mlx.distributed. gist.github.com/awni/ec071fd...

submitted 36 days ago • 0 comments

Financial Assistance applications are now open! If you face financial barriers to attending ICLR 2025, we encourage you to apply. The program offers prepay and reimbursement options. Applications are due March 2nd with decisions announced March 9th. iclr.cc/Conferences/...

submitted 37 days ago • 0 comments

📢 New course alert 📢 I am teaching a course entitled "Language Models and Structured Data" Institut Polytechnique de Paris. Topics are: Language Models (LMs), Prompt Engineering, LoRA, Quantized LMs, RAGs, Graphs, Tabular Data, Text2SQL The slides are available through: shorturl.at/w8iuO

submitted 40 days ago • 0 comments

Don't miss your chance to register for ELLIS Unit Amsterdam's Winter School on #AI #FoundationModels. The deadline is Jan 26. More info: ivi.fnwi.uva.nl/ellis/events...

submitted 41 days ago • 0 comments

folks, how would you fix this?

submitted 41 days ago • 2 comments

The Llama3 models were pre-trained on LibGen: www.theverge.com/2025/1/14/24...

submitted 42 days ago • 1 comment

Friendly reminder that we are advertising for a permanent faculty position in Embodied NLP at the School of Informatics, University of Edinburgh (informatics.ed.ac.uk) -- apply by Jan 31st! edin.ac/4fqgawg

submitted 44 days ago • 0 comments

In case you were wondering how things are going in Germany & on X, after Elon Musk announced his support for the far-right "Alternative für Deutschland" (AfD) in the upcoming Federal election: The chart below shows sums of tweets x impressions by members of parliament over the past 7 days...🧵⤵️

submitted 49 days ago • 19 comments

📕 ⬇️ My thesis on 🚫unargmaxable outputs is online! Check it out if you want to learn more about how output layers constrain what neural networks can and cannot predict 👉 era.ed.ac.uk/handle/1842/...

submitted 48 days ago • 1 comment

🦕 Special track at NeSy dedicated to KGs and Ontologies!! Have a look 🙌

submitted 48 days ago • 0 comments

Join us on 27 Feb in Amsterdam for the ELLIS workshop on Representation Learning and Generative Models for Structured Data ✨ sites.google.com/view/rl-and-... Inspiring talks by @eisenjulian.bsky.social, @neuralnoise.com, Frank Hutter, Vaishali Pal, TBC. We welcome extended abstracts until 31 Jan!

submitted 51 days ago • 2 comments