AI’s Dependence on Nonprofit Datasets Raises Unresolved Fair Use Questions

The practice of developing generative artificial intelligence models has been tied to extensive data sets, gathered, in part, by nonprofits. This has thrust the issue of Fair Use into the fore. Creators of large language models and text-to-art platforms such as ChatGPT and Stable Diffusion have leant heavily on these data amalgamations. An emerging question is whether or not AI firms can sidestep copyright liability, given they neither compiled, nor footed the bill for the materials used.

This issue has arisen due to the AI models that were trained using a substantial amount of nonprofit-culled information. In developing ChatGPT and Stable Diffusion, AI companies did not need to cast their nets too wide to find, and take advantage of, readily available resources that helped their models make a significant impact.

This prevalent use of nonprofit-compiled datasets raises complex legal and ethical questions. As it currently stands, no clear consensus exists regarding the limits of Fair Use in this context. More information can be obtained here.

Share this: