ThreadSky
About ThreadSky
Log In
dep4b.bsky.social
•
7 days ago
You could just refresh your static set every quarter. You are right that constant change is hard when you want reproducibility. I have just frozen the dataset and the date and used that.
Comments
Log in
with your Bluesky account to leave a comment
[–]
taidesu.bsky.social
•
7 days ago
Right but refreshing the data every quarter would also mean re-evaluating judgements. As I anticipate many of them would be dramatically different.
After typing all this out over the last few days I’m feeling like synthetic data may be the best way to go.
0
2
reply
[–]
taidesu.bsky.social
•
7 days ago
It’ll give us a stable baseline and we can represent some typical and some challenging retrieval examples that we’ve seen in the past.
0
1
reply
[–]
dep4b.bsky.social
•
7 days ago
My concern w synthetic is how do you know it reflects your actual data and queries? If you know it does then heck yeah.
0
reply
[–]
dep4b.bsky.social
•
7 days ago
Yeah, that is part of the workflow, and if you reevaluate every day or once per quarter or even per year doesn’t change that work.
0
1
reply
[–]
dep4b.bsky.social
•
7 days ago
This is why folks looove click based judgements. Versus human.
0
1
reply
[–]
dep4b.bsky.social
•
7 days ago
And why
https://www.ubisearch.dev
exists…. To facilitate that, among other use cases.
0
reply
[–]
dep4b.bsky.social
•
7 days ago
Then, if you are doing that then you can also do smart statistical sampling so you dont need ALL the data.. which makes experimenting easier.
1
1
reply
[–]
dep4b.bsky.social
•
7 days ago
We don’t test brand new drug compounds on people, why do we test relevancy changes in production??
3
reply
Posting Rules
Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service
×
Reply
Post Reply
Comments
After typing all this out over the last few days I’m feeling like synthetic data may be the best way to go.