Working on a "special" kind of data (health; medical images etc.) means experiencing the disappointment of realising the method in a paper relies critically on "having a very good image captioner", "asking GPT-4V", etc. - ThreadSky

About ThreadSky

hylandsl.bsky.social • 114 days ago

Working on a "special" kind of data (health; medical images etc.) means experiencing the disappointment of realising the method in a paper relies critically on "having a very good image captioner", "asking GPT-4V", etc.

Comments

segyges.bsky.social•114 days ago

there's a huge huge problem for reproducibility also, the gpt "models" are not models, they are services which do not guarantee identical performance over time

segyges.bsky.social•114 days ago

concretely oai updates the thing in ways that i am sure makes their numbers look better but that breaks automation workflows. they do this silently with no announcement, usually on weekends

hylandsl.bsky.social•114 days ago

I don't use OAI endpoints so haven't experienced this directly, but yes indeed there are serious issues with reproducibility; even assuming researchers actually specified which version of a given service they were using when they wrote their paper...

segyges.bsky.social•114 days ago

i would be willing to bet that most results won't repro at 6 months with exact code including model due to incremental updates

segyges.bsky.social•114 days ago

oai behavior is fundamentally delivering reliability and improvement for a chatbot and only a chatbot. other uses are not considered in enough detail for them to notice they are breaking them

hylandsl.bsky.social•114 days ago

Lots of people are building such tools for medical data (I myself have spent some time trying to caption chest X-rays; my colleagues built a model to promptably edit CXRs), but you can't assume their performance to be unquestionably good.

Posting Rules

Be respectful to others
No spam or self-promotion
Stay on topic
Follow Bluesky's terms of service

Comments

Posting Rules

Reply