Profile avatar
scottcondron.bsky.social
Working at wandb on Weave, helping teams ship AI applications
22 posts 284 followers 423 following
Regular Contributor
Conversation Starter

How do I get Bluesky to show me less politics and more AI/ML things? I have followed mostly people who work in AI/ML

Prompts within a complex system are brittle I have seen some teams be successful by replacing prompts with smaller, more deterministic components and improved reliability with fine-tuning. Anyone else have success with this approach? Seems to help a lot with agents

If you’re taking time to enjoy your family and not building with LLMs, you’re ngmi. America is cooked

LLM app dev broke our comparison tools because tiny diffs can cause large behaviour change. At wandb, we've spent years thinking about experiment comparison. We've added new tools for LLM app dev: code, prompts, models, configs, outputs, eval metrics, eval predictions, eval scores.. wandb.me/weave

The art of how to refer to model behaviour with tasteful non-person metaphors. Say “stochastic” you’re in one camp, say “emergent” you’re in another. It’s a minefield out there people

Being logged into wandb on your phone is a recipe for misery

Lessons from creating an llms.txt file An llms.txt file is a way to tell a LLM about your website. In the .txt file, you include links to other files with info to learn more. - the llms.txt file isn't the file you send to an LLM, you use it to generate a llms .md file

Your human and LLM judges should follow the same criteria. Then, you can transition from manual to automated evaluation once you have inter-annotator agreement between LLM & human. You now have a faster iteration speed and the annotator can focus on finding edge cases!

Put glue on pizza

The most bizarre AI interview I've ever done was at wandb when as usual I asked a candidate to build an AI classifier in any language/framework of their choice.. And they nonchalantly said "I'll write it in Redstone", to which I almost let loose a chuckle until...

Claude defaults to concise responses when there's high demand, clever way to smooth peaks

We've been working on just that at @weightsbiases.bsky.social with Weave! Weave is a lightweight llm tracing and evaluations toolkit, that focuses on letting you iterate fast and make sure that your production LLM based application is not degrading when you change prompts or models!