OpenAI Inc. has agreed to disclose the training data used for its artificial intelligence models, marking a significant development in the ongoing consolidated lawsuit brought by authors who claim their copyrighted works were improperly utilized to train ChatGPT. The decision comes as part of a newly established inspection protocol pursuant to an order by Magistrate Judge Robert M. Illman of the US District Court for the Northern District of California.
The court’s order, issued on Tuesday, outlines a detailed process for plaintiffs’ legal teams to examine OpenAI’s training data. The information accessed during this review is to be treated as highly confidential and restricted for attorneys’ eyes only, significantly limiting whom can view this sensitive data.
Experts from the plaintiffs’ side will have exclusive rights to inspect the data while adhering to strict confidentiality rules, ensuring that proprietary or trade secret information is shielded from public exposure. The terms of the order seek to balance the plaintiffs’ right to discovery with OpenAI’s interest in maintaining the confidentiality of its data sets.
For further details, you can read the official court order here.