Left second paragraph is wrong, should be the same as the right second paragraph. Left third paragraph is painfully wrong - despite the name, Transformers is not at all a "linguistic model", and the very first thing it does is throw away everything linguistic. And it's literally called...
LLMs learn fine without curated data. Also, "carefully curated" is a stretch. Supervised learning is curated on the scale of each training element or pair. "Curated" supervised learning is on the scale of whole dataset, websites, etc (or broad filters)
It's simply wrong to state that LLMs require carefully curated data. That is not factual. You can *improve* results with curated data, but the exact same thing can be said about training a human being.
I would argue well before scaling more got too cumbersome. The GPT2 paper has a single point to make: Datasets at the time were full of garbage text and curating it improved the resulting model enormously. There are no architectural innovations etc. It's just about data.
Comments
I'm really glad I'm not there, I'd be having to suppress groans throughout this entire presentation.
LLMs learn fine without curated data. Also, "carefully curated" is a stretch. Supervised learning is curated on the scale of each training element or pair. "Curated" supervised learning is on the scale of whole dataset, websites, etc (or broad filters)
I read Chomsky saying that was one of the key differentiators between humans and LLM's.