Profile avatar
tomwhi.bsky.social
2 posts 65 followers 461 following
Prolific Poster

I asked Claude “Make an interactive artifact that will illustrate to me why I should not start Civ VII right now.” This is what it came up with on its own.

Why do LLMs trained on over 90% English text perform so well in non-English languages? They find that they learn to share highly abstract grammatical concept representations, even across unrelated languages!

Fellow journos covering "AI": Please don't do their PR for them! "Virtual employees" is a harmful anthropomorphism in that it (a) is false; (b) confuses readers about the emerging technology and inaccurately lends human attributes like agency, accountability, etc.; and (c) harms humans in real jobs.

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after: Original release: 8 models, 540K downloads. Just the beginning... The community turned those open-weight models into +550 NEW models on @huggingface. Total downloads? 2.5M—nearly 5X the originals.

Of course, I *had* to test this with R1. I think it is fair to say that LRMs are fairly good at division now. What a progress!

We’re excited to introduce Transformer², a machine learning system that dynamically adjusts its weights for various tasks! sakana.ai/transformer-... Adaptation is a remarkable natural phenomenon, like how the octopus blends into its environment, or how the brain rewires itself after injury. 🧵 1/N

Yet more interesting research by sakana.ai

ByteDance Doubao-1.5-pro - Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark. - Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks. team.doubao.com/en/special/d...

Full Moon is a model client that makes great use of the MLX framework, running the Llama 3 1b or 3b model. 1b model operates at lightning speed locally on the iPhone, so if you're interested, feel free to check it out 🌕🧵1/3 Testflight: fullmoon.app

Would be fascinated to learn how this paper came into being given the authors on it: arxiv.org/abs/2412.05747

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

Human behavior happens at a surprisingly slow 10 bits/second or so, even though our sensory systems gather 8 orders of magnitude more data. Plus, we can only think about one thing at a time. We don’t know why (In LLM terms, human behavior happens at less than a token/sec). arxiv.org/abs/2408.10234

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

BERT is so back 🔥 Answer AI and Lighton released ModernBERT: lightning-fast state-of-the-art BERT model with Apache 2.0 license 🥹 2x fast as debertav3 and 3x faster than nomic 💨 all models are here hf.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb read more hf.co/blog/modernbert 📖

Really enjoy DSPy’s workflow for LLM work. Handing off the specifics of prompt generation and engineering back to the LLM makes a lot of sense: www.dbreunig.com/2024/12/12/p...

The quality of this Sora Remix test is pretty impressive: Scissors to crane. The prompt was “Close up of a curious crane bird looking around a beautiful nature scene by a pond. The birds head pops into the shot and then out.”

Inspired by @wimlds.bsky.social , I looked for a "Women in Machine Learning" starter pack and couldn't find one. So I created one! May have some mistakes. I'll try to do an AI ethics one next. 🤗 go.bsky.app/LT6CwNN

An Evolved Universal Transformer Memory sakana.ai/namm/ Introducing Neural Attention Memory Models (NAMM), a new kind of neural memory system for Transformers that not only boost their performance and efficiency but are also transferable to other foundation models without any additional training!

There is a huge gap in quality between results of the “casual prompt“ and dedicated prompt tuning/workflow. Just check ComfyUI-community. You can get Midjourney-comparable results with SD-1.5 when done by professional. Keep that in mind when playing with Sora.

DeepSeek-V2.5-1210 🔥 the updated version of DeepSeek-V2.5 just released! huggingface.co/deepseek-ai/... Upgrades include: ✨ MATH-500: 74.8% → 82.8% ✨ LiveCodebench: 29.2% → 34.38% ✨ Writing & reasoning improved on internal tests. ✨ Enhanced file upload & webpage summarization UX

Ooh new fineweb dataset just dropped: Fineweb 2 - 3T tokens of highly multilingual top-quality filtered data, permissively licensed! huggingface.co/datasets/Hug... Apologies to the GPU-Poors (like me!) who can only imagine what one could build with it, if only I had 10^25 FLOPs lying around)

new o1 model: - smarter and faster - image inputs - a new "pro mode" in ChatGPT to access extra compute - coming soon in the API

First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin) simonwillison.net/2024/Dec/4/a... The vibes are good with these ones - they're price and performance competitive with the Google Gemini family, which means they are _really_ inexpensive

“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.

Our Open Source Developers Guide to the EU AI Act is now live! Check it out for an introduction to the AI Act and useful tools that may help prepare for compliance, with a focus on open source. Amazing to work with @frimelle.bsky.social and @yjernite.bsky.social on this!

Given the recent attacks on AI posts from the vocal emergent luddite class, and inspired by @howard.fm's post this morning, here's an experiment in programmatic moderation...

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models. The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

Over the past 18 month, the @latentspacepod.bsky.social paper club has had an unbroken streak of hosting EVERY SINGLE WEEK. We gained technical knowledge + insider know-how, built friendships, and grown a community of learners. Here's how to start your own paper club eugeneyan.com/writing/pape...

Small yet mighty! 💫 We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠 We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...

I find it amusing that the emerging standard for giving an LLM the ability to work with your technology is just a text file explaining clearly how your technology works (Once folks realize they also need to sell the LLMs on why they should use a technology; things will get wild) llmstxt.org

This is neat. I added inline dependency metadata so you can run it using `uv run` without having to install it first: uv run 'http's://gist.githubusercontent.com/simonw/848a3b91169a789bc084a459aa7ecf83/raw/44fe7e0b326832e88beb83748b50104e5e7f70d0/follow_theirs.py gist.github.com/simonw/848a3...

✨ Jina AI just released Jina-CLIP-v2: A multimodal (images and texts) & multilingual embedding model. Details in 🧵 Model: huggingface.co/jinaai/jina-... 📈 Jina-CLIP-v2 outperforms Jina-CLIP-v1 (by 3% on text-image and text-text tasks) 🧵

I like this new analogy for working with LLMs by @emollick.bsky.social "treat AI like an infinitely patient new coworker who forgets everything you tell them each new conversation, one that comes highly recommended but whose actual abilities are not that clear" www.oneusefulthing.org/p/getting-st...

Hi new friends, I'm Eugene, and I sell 📚 at a bookstore. I build machine learning, recommender, and LLM systems to improve the discovery and reading experience for customers. I also write at eugeneyan.com and build at aiteratelabs.com; it helps me learn & clarify my thoughts. See you around! 👋

Training variance is a thing and no one measures it because research models get trained once to beat the benchmark by 0.2 AP or whatever and then never trained again. In prod one of the first things we do is train (the same model) a ton over different shuffled splits of the data in order to… 1/3

The return of the Autoregressive Image Model: AIMv2 now going multimodal. Excellent work by @alaaelnouby.bsky.social & team with code and checkpoints already up: arxiv.org/abs/2411.14402

Bluesky's firehose is a treasure trove of public data for researchers and developers, and it's completely free. Check out our developer docs: docs.bsky.app

Stable Flow: Vital Layers for Training-Free Image Editing Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel Cohen-Or tl;dr: embedding insertion into specific DiT layers for object replacement arxiv.org/abs/2411.14430

🚀 Qwen2.5-Coder release is a huge deal: - Matches GPT-4 coding capabilities - Open source & Apache 2.0 licensed - Supports 40+ programming languages - Available in 6 sizes (0.5B-32B) Exciting times ahead! huggingface.co/collections/...

Check hertz-dev release (apache-2.0!) by Standard Intelligence. It's quite impressive! below is a live conversation between a human and the model! huggingface.co/si-pbc/hertz...