While it is true that an AI chatbot cannot claim copyright over what it generates, and that would not automatically mean it is public domain, OpenAI terms and conditions make it clear that it is the user's responsibility to check whether the generated output is infringing https://youtu.be/fOTuIhOWFXU?t=13m35s
Terms of Service doesn't change that putting something into ChatGPT adds it to it's pool of data. There's no control over where it can reappear if someone breaks the ToS after all.
Got all that, just correcting use of "public domain," which means not covered by copyright. If someone steals your work and republishes it, they give it to the public but it's not in the public domain.
I think you're just claiming they're violating your copyright. I actually think the violation was by the person who gave them the text. If OpenAI republishes something they were illegally given, I'm not sure they're liable, as long as they take it down when asked (but not a copyright expert!).
Can you clarify what you mean by “inputs as further training data?” My understanding is that the underlying model is pre-trained (the P in GPT), but that outputs can be fine-tuned to user preferences. This fine-tuning is not the same as training because the underlying model is unchanged.
Comments
https://authorsguild.org/news/ag-and-authors-file-class-action-suit-against-openai/
PatronusAI tested the major LLMs of AI tech companies and found GPT4 generated a high rate of copyrighted content
https://qz.com/openai-chatgpt-anthropic-claude-copyright-law-violation-1851311580
https://youtu.be/fOTuIhOWFXU?t=13m35s