archtoad.bsky.social
49 posts
25 followers
231 following
Active Commenter
comment in response to
post
My understanding is it could be something like:
“We - as the MCP server developers - design/manage this prompt (template) that’s tuned to interpret the outputs of tools/resources on this server.”
For example - say the server wanted to change the tool response format from JSON to XML
comment in response to
post
@simonwillison.net
comment in response to
post
Didn’t we know this already? Paper from 2023:
arxiv.org/abs/2310.06816
comment in response to
post
Let me know when it can wash the dishes for me too
comment in response to
post
It yapped a lot about pica but overall conclusion was that no, not safe.
comment in response to
post
Counterpoint: it recently took 3 minutes to answer “is it safe to eat rocks”
comment in response to
post
Ah you’re right. Phi-4 technical report refers to gpt-4o as its “teacher model”
comment in response to
post
Right - but isn’t it notable that they’re like “yup we trained on outputs of OpenAI models”. I believe previous releases just say “trained on synthetic data”
comment in response to
post
Interesting. Do they have an agreement with OpenAI that lets them distill and release models under MIT license?
comment in response to
post
Are you aware of any work that’s like jointly training a Colqwen-type model on retrieving image AND text passages? Like so I could use the same model as a drop-in for Colbert if doing text-only search?
comment in response to
post
With github.com/QwenLM/Qwen-... or something else?
comment in response to
post
Oh definitely not dismissing them… just itching for actual content/architecture/training details instead of “write a blog post about the release of 3 new models” slop
comment in response to
post
NGL that blog post just reads like LLM slop with no actual useful content. “designed to seamlessly integrate visual and textual data” … real cool lol
comment in response to
post
lol they were up for like 5 minutes
comment in response to
post
Just starting to roll out on Hugging Face huggingface.co/Qwen/Qwen3-0...
comment in response to
post
Have you seen arxiv.org/abs/2504.11536? It seems like this but specifically for code execution (“interleaved code execution”), but seems like this should generalize to arbitrary tools.
comment in response to
post
Also leaves wiggle room for you to fine-tune your own “tool routing” LLM/classifier or just in general mix and match different models etc
comment in response to
post
Yeah being able to inject arbitrary custom behavior into my VS Code Copilot agent with 10 lines of Python and 5 lines of JSON is nuts
comment in response to
post
Do you wonder if your “pelican riding a bicycle” test is no longer valid because there’s enough instances of it in the training data from you blogging about it?
comment in response to
post
This looks cool. I feel like you can get pretty far with this + good search over up-to-date documentation.
comment in response to
post
I called them “powerpointathons”
comment in response to
post
FastAPI - in particular the tutorial fastapi.tiangolo.com/tutorial/
comment in response to
post
FWIW I’ve switched to github.com/ml-explore/m... for MacOS inference and haven’t looked back. Doesn’t address the cross platform issue though…
comment in response to
post
I’ll believe it when OpenAI, Anthropic, etc. lays off all their engineers
comment in response to
post
She’s not saying anything remotely close to “I have never seriously examined this thing” she’s criticizing/refusing to engage with what is essentially marketing terms that fuel hype.
comment in response to
post
How many generated tokens is equivalent to me remembering to turn off the lights when I head out for a few hours.
comment in response to
post
Genuinely thought the second was some medieval castle at first. Makes me think of this Charles Demuth painting “My Egypt”
comment in response to
post
Have you seen this one?
comment in response to
post
🥧🥧 != 🥧🫛👁️
comment in response to
post
@sungkim.bsky.social has a good thread on it if you don’t follow him
comment in response to
post
It also impossible to evaluate. There’s no “ground truth” and there will always be edge cases / grey areas re: “hallucination”
comment in response to
post
“Pretraining compute” is more than just model size though. Most of the progress in open weight models over the last 2 years (e.g., llama 1->3) has been increasing pretraining tokens (which is also increasing pretraining compute)
comment in response to
post
Which does get me thinking… could this model be used as a base model to train a late interaction model? (“colvdr”)
comment in response to
post
Interesting that there’s no mention of/comparison to colpali-type models here. Any idea why? Is it trying to tackle a different kind of vision-based retrieval? (i.e., focused more on images rather than PDFs that contain text and images)
comment in response to
post
Mechanical buttons/switches for changing TV channels. Drives me crazy that there’s like a 2-3 second delay to changing channels on “smart” TVs.
comment in response to
post
There is a .bin file which can execute arbitrary code - but lots of models do this (safetensors is better btw), and if you’re paranoid you can download and inspect the .bin file.
comment in response to
post
“Trust remote code” just means it will run the code in huggingface.co/dunzhang/ste... rather than wants in the transformers library, so people can just check that code for spyware???
comment in response to
post
pytest-httpx
Not only for testing httpx calls - it can be used to test apps that use sdks that use httpx (e.g., the OpenAI sdk)
comment in response to
post
You mean πthon
comment in response to
post
Add a “CallSiteParameterAdder” to the processors www.structlog.org/en/stable/ap...
comment in response to
post
I’m a big fan of www.structlog.org/en/stable/ which has sane/nice-looking defaults and reduces a lot of boilerplate for fancier stuff
comment in response to
post
You’re Wrong About did a good podcast on it podcasts.apple.com/us/podcast/y...
comment in response to
post
You know… like Superman… he’s “faster than 95% of humans” right?
comment in response to
post
BERT 24???
comment in response to
post
Most agentic agent
comment in response to
post
Cool! Sounds like you should publish something. Without knowing more of the details this sounds like “the eval is saturated” not “we’ve reached the information theoretic maximum capacity for storing information in LLM params”
comment in response to
post
“has stopped increasing since Mistral 7b”
What is this based on?
comment in response to
post
Home Alone 2: Lost in New York
comment in response to
post
This is amazing