tomwhi.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

I asked Claude “Make an interactive artifact that will illustrate to me why I should not start Civ VII right now.” This is what it came up with on its own.

submitted 15 days ago • 11 comments

Why do LLMs trained on over 90% English text perform so well in non-English languages? They find that they learn to share highly abstract grammatical concept representations, even across unrelated languages!

submitted 18 days ago • 3 comments

Fellow journos covering "AI": Please don't do their PR for them! "Virtual employees" is a harmful anthropomorphism in that it (a) is false; (b) confuses readers about the emerging technology and inaccurately lends human attributes like agency, accountability, etc.; and (c) harms humans in real jobs.

submitted 49 days ago • 10 comments

Yes, DeepSeek R1's release is impressive. But the real story is what happened in just 7 days after: Original release: 8 models, 540K downloads. Just the beginning... The community turned those open-weight models into +550 NEW models on @huggingface. Total downloads? 2.5M—nearly 5X the originals.

submitted 28 days ago • 2 comments

Of course, I had to test this with R1. I think it is fair to say that LRMs are fairly good at division now. What a progress!

submitted 28 days ago • 0 comments

We’re excited to introduce Transformer², a machine learning system that dynamically adjusts its weights for various tasks! sakana.ai/transformer-... Adaptation is a remarkable natural phenomenon, like how the octopus blends into its environment, or how the brain rewires itself after injury. 🧵 1/N

submitted 40 days ago • 1 comment

Yet more interesting research by sakana.ai

submitted 29 days ago • 0 comments

ByteDance Doubao-1.5-pro - Includes a "Deep Thinking" mode, surpassing O1-preview and O1 models on the AIME benchmark. - Outperforms deepseek-v3, gpt4o, and llama3.1-405B on popular benchmarks. team.doubao.com/en/special/d...

submitted 30 days ago • 2 comments

Full Moon is a model client that makes great use of the MLX framework, running the Llama 3 1b or 3b model. 1b model operates at lightning speed locally on the iPhone, so if you're interested, feel free to check it out 🌕🧵1/3 Testflight: fullmoon.app

submitted 64 days ago • 2 comments

Would be fascinated to learn how this paper came into being given the authors on it: arxiv.org/abs/2412.05747

submitted 55 days ago • 2 comments

Today OpenAI announced o3, its next-gen reasoning model. We've worked with OpenAI to test it on ARC-AGI, and we believe it represents a significant breakthrough in getting AI to adapt to novel tasks.

submitted 66 days ago • 20 comments

Human behavior happens at a surprisingly slow 10 bits/second or so, even though our sensory systems gather 8 orders of magnitude more data. Plus, we can only think about one thing at a time. We don’t know why (In LLM terms, human behavior happens at less than a token/sec). arxiv.org/abs/2408.10234

submitted 66 days ago • 10 comments

I'll get straight to the point. We trained 2 new models. Like BERT, but modern. ModernBERT. Not some hypey GenAI thing, but a proper workhorse model, for retrieval, classification, etc. Real practical stuff. It's much faster, more accurate, longer context, and more useful. 🧵

submitted 67 days ago • 19 comments

BERT is so back 🔥 Answer AI and Lighton released ModernBERT: lightning-fast state-of-the-art BERT model with Apache 2.0 license 🥹 2x fast as debertav3 and 3x faster than nomic 💨 all models are here hf.co/collections/answerdotai/modernbert-67627ad707a4acbf33c41deb read more hf.co/blog/modernbert 📖

submitted 67 days ago • 3 comments

Really enjoy DSPy’s workflow for LLM work. Handing off the specifics of prompt generation and engineering back to the LLM makes a lot of sense: www.dbreunig.com/2024/12/12/p...

submitted 73 days ago • 1 comment

The quality of this Sora Remix test is pretty impressive: Scissors to crane. The prompt was “Close up of a curious crane bird looking around a beautiful nature scene by a pond. The birds head pops into the shot and then out.”

submitted 75 days ago • 2 comments

Inspired by @wimlds.bsky.social , I looked for a "Women in Machine Learning" starter pack and couldn't find one. So I created one! May have some mistakes. I'll try to do an AI ethics one next. 🤗 go.bsky.app/LT6CwNN

submitted 75 days ago • 56 comments

An Evolved Universal Transformer Memory sakana.ai/namm/ Introducing Neural Attention Memory Models (NAMM), a new kind of neural memory system for Transformers that not only boost their performance and efficiency but are also transferable to other foundation models without any additional training!

submitted 76 days ago • 1 comment

There is a huge gap in quality between results of the “casual prompt“ and dedicated prompt tuning/workflow. Just check ComfyUI-community. You can get Midjourney-comparable results with SD-1.5 when done by professional. Keep that in mind when playing with Sora.

submitted 76 days ago • 1 comment

DeepSeek-V2.5-1210 🔥 the updated version of DeepSeek-V2.5 just released! huggingface.co/deepseek-ai/... Upgrades include: ✨ MATH-500: 74.8% → 82.8% ✨ LiveCodebench: 29.2% → 34.38% ✨ Writing & reasoning improved on internal tests. ✨ Enhanced file upload & webpage summarization UX

submitted 76 days ago • 0 comments

Ooh new fineweb dataset just dropped: Fineweb 2 - 3T tokens of highly multilingual top-quality filtered data, permissively licensed! huggingface.co/datasets/Hug... Apologies to the GPU-Poors (like me!) who can only imagine what one could build with it, if only I had 10^25 FLOPs lying around)

submitted 78 days ago • 2 comments

new o1 model: - smarter and faster - image inputs - a new "pro mode" in ChatGPT to access extra compute - coming soon in the API

submitted 81 days ago • 1 comment

First impressions of the new Amazon Nova LLMs (via a new llm-bedrock plugin) simonwillison.net/2024/Dec/4/a... The vibes are good with these ones - they're price and performance competitive with the Google Gemini family, which means they are _really_ inexpensive

submitted 82 days ago • 8 comments

“They said it could not be done”. We’re releasing Pleias 1.0, the first suite of models trained on open data (either permissibly licensed or uncopyrighted): Pleias-3b, Pleias-1b and Pleias-350m, all based on the two trillion tokens set from Common Corpus.

submitted 81 days ago • 12 comments

Our Open Source Developers Guide to the EU AI Act is now live! Check it out for an introduction to the AI Act and useful tools that may help prepare for compliance, with a focus on open source. Amazing to work with @frimelle.bsky.social and @yjernite.bsky.social on this!

submitted 84 days ago • 1 comment

Given the recent attacks on AI posts from the vocal emergent luddite class, and inspired by @howard.fm's post this morning, here's an experiment in programmatic moderation...

submitted 88 days ago • 5 comments

A dataset of 1 million or 2 million Bluesky posts is completely irrelevant to training large language models. The primary usecase for the datasets that people are losing their shit over isn't ChatGPT, it's social science research and developing systems that improve Bluesky.

submitted 88 days ago • 8 comments

Over the past 18 month, the @latentspacepod.bsky.social paper club has had an unbroken streak of hosting EVERY SINGLE WEEK. We gained technical knowledge + insider know-how, built friendships, and grown a community of learners. Here's how to start your own paper club eugeneyan.com/writing/pape...

submitted 89 days ago • 2 comments

Small yet mighty! 💫 We are releasing SmolVLM: a new 2B small vision language made for on-device use, fine-tunable on consumer GPU, immensely memory efficient 🤠 We release three checkpoints under Apache 2.0: SmolVLM-Instruct, SmolVLM-Synthetic and SmolVLM-Base huggingface.co/collections/...

submitted 90 days ago • 11 comments

I find it amusing that the emerging standard for giving an LLM the ability to work with your technology is just a text file explaining clearly how your technology works (Once folks realize they also need to sell the LLMs on why they should use a technology; things will get wild) llmstxt.org

submitted 90 days ago • 3 comments

This is neat. I added inline dependency metadata so you can run it using `uv run` without having to install it first: uv run 'http's://gist.githubusercontent.com/simonw/848a3b91169a789bc084a459aa7ecf83/raw/44fe7e0b326832e88beb83748b50104e5e7f70d0/follow_theirs.py gist.github.com/simonw/848a3...

submitted 92 days ago • 13 comments

✨ Jina AI just released Jina-CLIP-v2: A multimodal (images and texts) & multilingual embedding model. Details in 🧵 Model: huggingface.co/jinaai/jina-... 📈 Jina-CLIP-v2 outperforms Jina-CLIP-v1 (by 3% on text-image and text-text tasks) 🧵

submitted 91 days ago • 2 comments

I like this new analogy for working with LLMs by @emollick.bsky.social "treat AI like an infinitely patient new coworker who forgets everything you tell them each new conversation, one that comes highly recommended but whose actual abilities are not that clear" www.oneusefulthing.org/p/getting-st...

submitted 91 days ago • 6 comments

Hi new friends, I'm Eugene, and I sell 📚 at a bookstore. I build machine learning, recommender, and LLM systems to improve the discovery and reading experience for customers. I also write at eugeneyan.com and build at aiteratelabs.com; it helps me learn & clarify my thoughts. See you around! 👋

submitted 107 days ago • 5 comments

Training variance is a thing and no one measures it because research models get trained once to beat the benchmark by 0.2 AP or whatever and then never trained again. In prod one of the first things we do is train (the same model) a ton over different shuffled splits of the data in order to… 1/3

submitted 93 days ago • 2 comments

The return of the Autoregressive Image Model: AIMv2 now going multimodal. Excellent work by @alaaelnouby.bsky.social & team with code and checkpoints already up: arxiv.org/abs/2411.14402

submitted 94 days ago • 1 comment

Bluesky's firehose is a treasure trove of public data for researchers and developers, and it's completely free. Check out our developer docs: docs.bsky.app

submitted 93 days ago • 336 comments

Stable Flow: Vital Layers for Training-Free Image Editing Omri Avrahami, Or Patashnik, Ohad Fried, Egor Nemchinov, Kfir Aberman, Dani Lischinski, Daniel Cohen-Or tl;dr: embedding insertion into specific DiT layers for object replacement arxiv.org/abs/2411.14430

submitted 94 days ago • 0 comments

🚀 Qwen2.5-Coder release is a huge deal: - Matches GPT-4 coding capabilities - Open source & Apache 2.0 licensed - Supports 40+ programming languages - Available in 6 sizes (0.5B-32B) Exciting times ahead! huggingface.co/collections/...

submitted 103 days ago • 0 comments

Check hertz-dev release (apache-2.0!) by Standard Intelligence. It's quite impressive! below is a live conversation between a human and the model! huggingface.co/si-pbc/hertz...

submitted 101 days ago • 0 comments