archtoad.bsky.social - Profile | ThreadSky | a Reddit-style client for Bluesky

comment in response to post

Can’t wait for the first “Barbie told me to take ketamine” story

submitted 11 hours ago

comment in response to post

My understanding is it could be something like: “We - as the MCP server developers - design/manage this prompt (template) that’s tuned to interpret the outputs of tools/resources on this server.” For example - say the server wanted to change the tool response format from JSON to XML

submitted 13 days ago

comment in response to post

@simonwillison.net

submitted 14 days ago

comment in response to post

Didn’t we know this already? Paper from 2023: arxiv.org/abs/2310.06816

submitted 22 days ago

comment in response to post

Let me know when it can wash the dishes for me too

submitted 32 days ago

comment in response to post

It yapped a lot about pica but overall conclusion was that no, not safe.

submitted 35 days ago

comment in response to post

Counterpoint: it recently took 3 minutes to answer “is it safe to eat rocks”

submitted 36 days ago

comment in response to post

Ah you’re right. Phi-4 technical report refers to gpt-4o as its “teacher model”

submitted 41 days ago

comment in response to post

Right - but isn’t it notable that they’re like “yup we trained on outputs of OpenAI models”. I believe previous releases just say “trained on synthetic data”

submitted 41 days ago

comment in response to post

Interesting. Do they have an agreement with OpenAI that lets them distill and release models under MIT license?

submitted 41 days ago

comment in response to post

Are you aware of any work that’s like jointly training a Colqwen-type model on retrieving image AND text passages? Like so I could use the same model as a drop-in for Colbert if doing text-only search?

submitted 41 days ago

comment in response to post

With github.com/QwenLM/Qwen-... or something else?

submitted 46 days ago

comment in response to post

Oh definitely not dismissing them… just itching for actual content/architecture/training details instead of “write a blog post about the release of 3 new models” slop

submitted 46 days ago

comment in response to post

NGL that blog post just reads like LLM slop with no actual useful content. “designed to seamlessly integrate visual and textual data” … real cool lol

submitted 46 days ago

comment in response to post

lol they were up for like 5 minutes

submitted 46 days ago

comment in response to post

Just starting to roll out on Hugging Face huggingface.co/Qwen/Qwen3-0...

submitted 46 days ago

comment in response to post

Have you seen arxiv.org/abs/2504.11536? It seems like this but specifically for code execution (“interleaved code execution”), but seems like this should generalize to arbitrary tools.

submitted 55 days ago

comment in response to post

Also leaves wiggle room for you to fine-tune your own “tool routing” LLM/classifier or just in general mix and match different models etc

submitted 59 days ago

comment in response to post

Yeah being able to inject arbitrary custom behavior into my VS Code Copilot agent with 10 lines of Python and 5 lines of JSON is nuts

submitted 59 days ago

comment in response to post

Do you wonder if your “pelican riding a bicycle” test is no longer valid because there’s enough instances of it in the training data from you blogging about it?

submitted 60 days ago

comment in response to post

This looks cool. I feel like you can get pretty far with this + good search over up-to-date documentation.

submitted 61 days ago

comment in response to post

I called them “powerpointathons”

submitted 66 days ago

comment in response to post

FastAPI - in particular the tutorial fastapi.tiangolo.com/tutorial/

submitted 67 days ago

comment in response to post

FWIW I’ve switched to github.com/ml-explore/m... for MacOS inference and haven’t looked back. Doesn’t address the cross platform issue though…

submitted 75 days ago

comment in response to post

I’ll believe it when OpenAI, Anthropic, etc. lays off all their engineers

submitted 89 days ago

comment in response to post

She’s not saying anything remotely close to “I have never seriously examined this thing” she’s criticizing/refusing to engage with what is essentially marketing terms that fuel hype.

submitted 97 days ago

comment in response to post

How many generated tokens is equivalent to me remembering to turn off the lights when I head out for a few hours.

submitted 115 days ago

comment in response to post

Genuinely thought the second was some medieval castle at first. Makes me think of this Charles Demuth painting “My Egypt”

submitted 132 days ago

comment in response to post

Have you seen this one?

submitted 139 days ago

comment in response to post

🥧🥧 != 🥧🫛👁️

submitted 141 days ago

comment in response to post

@sungkim.bsky.social has a good thread on it if you don’t follow him

submitted 143 days ago

comment in response to post

It also impossible to evaluate. There’s no “ground truth” and there will always be edge cases / grey areas re: “hallucination”

submitted 144 days ago

comment in response to post

“Pretraining compute” is more than just model size though. Most of the progress in open weight models over the last 2 years (e.g., llama 1->3) has been increasing pretraining tokens (which is also increasing pretraining compute)

submitted 149 days ago

comment in response to post

Which does get me thinking… could this model be used as a base model to train a late interaction model? (“colvdr”)

submitted 151 days ago

comment in response to post

Interesting that there’s no mention of/comparison to colpali-type models here. Any idea why? Is it trying to tackle a different kind of vision-based retrieval? (i.e., focused more on images rather than PDFs that contain text and images)

submitted 151 days ago

comment in response to post

Mechanical buttons/switches for changing TV channels. Drives me crazy that there’s like a 2-3 second delay to changing channels on “smart” TVs.

submitted 152 days ago

comment in response to post

There is a .bin file which can execute arbitrary code - but lots of models do this (safetensors is better btw), and if you’re paranoid you can download and inspect the .bin file.

submitted 152 days ago

comment in response to post

“Trust remote code” just means it will run the code in huggingface.co/dunzhang/ste... rather than wants in the transformers library, so people can just check that code for spyware???

submitted 152 days ago

comment in response to post

pytest-httpx Not only for testing httpx calls - it can be used to test apps that use sdks that use httpx (e.g., the OpenAI sdk)

submitted 155 days ago

comment in response to post

You mean πthon

submitted 156 days ago

comment in response to post

Add a “CallSiteParameterAdder” to the processors www.structlog.org/en/stable/ap...

submitted 160 days ago

comment in response to post

I’m a big fan of www.structlog.org/en/stable/ which has sane/nice-looking defaults and reduces a lot of boilerplate for fancier stuff

submitted 161 days ago

comment in response to post

You’re Wrong About did a good podcast on it podcasts.apple.com/us/podcast/y...

submitted 166 days ago

comment in response to post

You know… like Superman… he’s “faster than 95% of humans” right?

submitted 168 days ago

comment in response to post

BERT 24???

submitted 176 days ago

comment in response to post

Most agentic agent

submitted 179 days ago

comment in response to post

Cool! Sounds like you should publish something. Without knowing more of the details this sounds like “the eval is saturated” not “we’ve reached the information theoretic maximum capacity for storing information in LLM params”

submitted 187 days ago

comment in response to post

“has stopped increasing since Mistral 7b” What is this based on?

submitted 187 days ago

comment in response to post

Home Alone 2: Lost in New York

submitted 187 days ago

comment in response to post

This is amazing

submitted 206 days ago