LLM app dev broke our comparison tools because tiny diffs can cause large behaviour change.
At wandb, we've spent years thinking about experiment comparison. We've added new tools for LLM app dev: code, prompts, models, configs, outputs, eval metrics, eval predictions, eval scores..
https://wandb.me/weave
At wandb, we've spent years thinking about experiment comparison. We've added new tools for LLM app dev: code, prompts, models, configs, outputs, eval metrics, eval predictions, eval scores..
https://wandb.me/weave
1 / 3
Comments