This might be an absurd question from the otherside of the academic sphere, but isn't half of the problem in terms of ML/LLM data sets the lack of data and the corruption of data? Why haven't there been data models that fix that? Why can't we automate the allegedly low-skill labor to generate data?
