LLM app dev broke our comparison tools because tiny diffs can cause large behaviour change. At wandb, we've spent years thinking about experiment comparison. We've added new tools for LLM app dev: code, prompts, models, configs, outputs, eval metrics, eval predictions, eval scores.. wandb.me/weave - ThreadSky

scottcondron.bsky.social • 105 days ago

LLM app dev broke our comparison tools because tiny diffs can cause large behaviour change.

At wandb, we've spent years thinking about experiment comparison. We've added new tools for LLM app dev: code, prompts, models, configs, outputs, eval metrics, eval predictions, eval scores..
https://wandb.me/weave

1 / 3

Comments

Posting Rules

Comments

Posting Rules

Reply