every LLM task needs a curated, iteratively developed quant eval.
everyone who deploys a single prompt is a data scientist now, whether they realize it or not.
there is a lot of money to be made in helping orgs internalize this.
everyone who deploys a single prompt is a data scientist now, whether they realize it or not.
there is a lot of money to be made in helping orgs internalize this.
Reposted from
Eugene Yan
Repeat after me:
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
I will build evals for my tasks.
Comments