Fun fact, one of the earliest very large datasets that were being used by students to develop LLMs(2017ish iorc) was every single reddit comment made the previous year. The dataset was around 150gb if I’m jot misremembering
Another one was the entire script to the cornetto trilogy
Really funny ways ppl had to filter data and make the LLMs answer different things.
I don’t think many of us messing around knew it would come to this point sadly
There was an old experiment that showed birds could be trained to sort pills more efficiently than people, it was also determined to be animal cruelty.
We will never hear about the first artificial general intelligence because it will immediately off itself since it would’ve ingested all of Twitter, Tumblr and 4Chan at that point
Comments
Really funny ways ppl had to filter data and make the LLMs answer different things.
I don’t think many of us messing around knew it would come to this point sadly