Transparency is wonderful, but it is surprising that a corporation this big allows this to happen. It is known that the models still struggle with a lot of things. Why let it loose in public if the results are not good enough? Was it tested internally enough? I'm not sure.
It also matches my experience that one thing is library code, and another thing is application code. I work a lot with the Qt framework. There is only one Qt codebase (OK, many modules, but still), but there are thousands of Qt apps to train on. Not enough data leads to poorer models.
All this bad pull requests are not in app code, but in library code. In my uses of LLMs, they always struggle at creating the foundations of an app. They do the minimal work, not something that could be a good base to work on. Hence, not good enough library code, but decent app work.
Comments
1) Helps you decide if you should use it (or not!)
2) Remind me of any AI startup that shows their agents in public doing the work?
If anything, more tools should do it. Transparency is wonderful, no?