williamheld.com - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

As far as I can tell, the models aren't good enough right now that they can replace VFX at any high quality commercial scale. They are exactly good enough to generate fake viral videos for ad revenue on TikTok/Instagram & spread misinformation. Is there any serious argument for their safe release??

submitted 4 days ago

comment in response to post

I don't really see an argument for releasing such models with photorealistic generation capabilities. What valid & frequent business use case is there for photorealistic video & voice generation like Veo 3 offers?

submitted 4 days ago

comment in response to post

Now, I wouldn't do research on LLMs if I thought that was true in the long term! But I think it's reasonable for skeptics to question whether advances in inference efficiency, hardware efficiency, and even core energy infrastructure will happen soon enough for current companies to capitalize.

submitted 16 days ago

comment in response to post

The underlying assumption being that they can (a la Uber/Lyft) eventually increase prices once the core customers are fundamentally reliant on AI. The real question then is "what is demand once you start charging the true unit costs?". Personally, I found this article sobering but well reasoned.

submitted 16 days ago

comment in response to post

Without knowing all the model details or with transparent financials, it's hard to say but I would naively suspect most AI companies are in the red both on a cost per query basis (for API services) and on a cost per user basis (for subscription services).

submitted 16 days ago

comment in response to post

I haven't seen people mocking the revenue forecasts, but I agree with your take w.r.t. demand. The bigger question is whether demand is the constraint? Unlike standard software or even manufacturing businesses, I'm not sure the economies of scale look great if you factor in cost per query.

submitted 16 days ago

comment in response to post

Given that they published the same work in both the ICLR workshop and ACL... I am skeptical of the claim that "The current version of Zochi represents a substantial advancement over our earlier systems that published workshop papers at ICLR 2025" 😂

submitted 23 days ago

comment in response to post

Looks like they simultaneously submitted the same paper to an ICLR workshop: openreview.net/forum?id=rDC...

submitted 23 days ago

comment in response to post

Learn more about the project in Percy's blog post: marin.community/blog/2025/05... And about the Models we are releasing in @dlwh.bsky.social's training retro: marin.readthedocs.io/en/latest/re...

submitted 32 days ago

comment in response to post

Last August, I chatted with @dlwh.bsky.social about the need for an open-source set of scaling law checkpoints! Since then, I was lucky to play a (small) role in building Marin-8B. Check out the model (including intermediate checkpoints) here: huggingface.co/marin-commun...

submitted 32 days ago

comment in response to post

We have trained some respectable models from scratch! - Marin-8B-Base: beats Llama 3.1 8B on 14/19 benchmarks - Marin-8B-Instruct: try it out on HuggingFace: huggingface.co/spaces/WillH...

submitted 32 days ago

comment in response to post

Marin repurposes GitHub, which has been successful for open-source *software*, for AI: 1. Preregister an experiment as a GitHub issue 2. Submit a PR, which implements the experiment in code 3. PR is reviewed by experts in the community 4. Watch the execution of the experiment live!

submitted 32 days ago

comment in response to post

Want to add your model to CAVA? If it runs on VLLM, it runs on CAVA - no extra code needed. We’ve open-sourced everything on GitHub: 🔗 github.com/SALT-NLP/CAVA We’re open to collaborations --- test, extend, and help with large audio model evaluation! (5/5)

submitted 44 days ago

comment in response to post

Why CAVA matters? We talked with people who are building voice products and found most benchmarks don't capture their concerns! → Which model gives you low-latency conversations? → Which model can execute functions to go beyond chat? → Which model is the easiest to adjust and improve via prompts?

submitted 44 days ago

comment in response to post

Results? We tested ✅ GPT-4o (end-to-end audio) ✅ GPT pipeline (transcribe + text + TTS) ✅ Gemini 2.0 Flash ✅ Gemini 2.5 Pro We find GPT-4o shines on latency & tone while Gemini 2.5 leads in safety & prompt adherence. No model wins everything. (3/5)

submitted 44 days ago

comment in response to post

Most benchmarks test either core chat or broader audio analysis abilities. But voice assistants need to handle turn-taking, interpret tone, execute tasks via function calls, and respect instructions and safety constraints—all in real-time. CAVA tests models each of these capabilities (2/5)

submitted 44 days ago

comment in response to post

AxBench makes the argument that most of the excitement around SAEs for steering lacked systematic evals which over hyped their effectiveness. This is echoed by Google moving away from them after negative results in more systematic evals: www.lesswrong.com/posts/4uXCAJ...

submitted 45 days ago

comment in response to post

FWIW, this line of research seems to have largely been shown to be ineffective for model steering in practice! arxiv.org/abs/2501.17148 from @aryaman.io is my reference but several others have shown similar results!

submitted 46 days ago

comment in response to post

Code if you are interested in running your own Claude Realtime Voice: github.com/Helw150/mcp-... Mostly dead simple logic, but requires launching Claude from the terminal because the Claude Desktop app won't request microphone permissions otherwise!

submitted 71 days ago

comment in response to post

aclanthology.org/2023.acl-lon... is a great interview study!

submitted 82 days ago

comment in response to post

As always, we open source everything. Even our nicely made website: egonormia.org Please check out the leaderboard, the blog (w/Bibtex support), the code, data, as well as a data viewer.

submitted 109 days ago

comment in response to post

Now most *urgently*: we review the history of these models. A straight line can be traced to modern AI from basic science. Not in engineering but in the cognitive science of language. Much of it funded by NSF, whose funding has now been paused. www.goldengooseaward.org/01awardees/pdp

submitted 142 days ago