Unlicensed Data in Open-Source Sets Poses Legal Challenges for AI Industry

The Data Provenance Institute has raised alarm on the widespread presence of improperly licensed data within open-source data sets. The institute, consisting of machine-learning engineers and legal experts, reported a significant amount of the data used for training large language models to be unlicensed. Among the institute’s voiced concerns, one is the practical difficulty faced by organizations that would otherwise be willing to use their assets more responsibly. “People couldn’t do the right thing, even if they wanted to,” explains co-author Sarah Hooker. The Washington Post provides further information on the report’s findings.

Sy Damle, a representative attorney from Latham & Watkins for OpenAI, has been revealed by Politico to be playing a direct role in advising Congress against formulating new copyright-specific legislation targeting AI platforms. The document, signed by numerous think tanks, faculty members, and members of civil society organizations involved in the AI and copyright debate, argues that current laws suffice in addressing any legitimate copyright issues raised by the new technology.

As the advancement of AI presents potential dangers, the EU is in the final stages of implementing the world’s first AI-specific legislation. The Guardian sheds light on this framework, which aims to regulate a vast array of sectors influenced by AI, ranging from homemade chemical weaponry to copyright theft in music, literature, and art sectors. The legislation will likely serve as a global template for lawmakers in managing the benefits and risks linked to the rise of AI.

The impact of AI in the legal industry is also being evaluated, and in particular, its potential effects on e-discovery. A piece by Legaltech News illustrates that key factors of interest include the potential influence of AI on e-discovery costs and the resultant effects on ESI protocols, as well as the Federal Rules of Civil Procedure. More details can be found in the report.

Lastly, a significant boost to Microsoft’s Q3 performance has been attributed to its cloud-computing and AI-related services. As indicated by CEO Satya Nadella, AI is being rapidly integrated into each layer of the tech stack, with a primary goal of enhancing productivity for Microsoft’s customers. According to the AP, reports from The New York Times also contextualize Microsoft’s successful quarter, highlighting some of the difficulties other tech giants including Alphabet and Meta have encountered.