Battle for survival: AI companies versus right holders
How copyright lawsuits can change the AI-industry and the Internet as we know it
Future Crew
Copyright has always been a shield of the human creativity. But thanks to AI, works have appeared that were created with almost no human intervention, if we put aside the artists, photographers, writers, and journalists whose works were used to train the AI. The question of who owns copyright and related rights in this situation is increasingly coming to court. Major lawsuits against the creators and users of AI can change the financial model of this industry or even close it altogether. Now, without waiting for the outcome of the trials, many popular sites restrict access to their materials in order to prevent AI from learning on them.
Why are the authors up in arms against AI?
The main concerns and claims of the authors are summarized in the resonant “Statement on AI Training”, which appeared in October and has already been signed by more than 28 thousand creative people, including superstars. They are protesting the unlicensed use of their creativity to train AI and are demanding a ban on the practice, which threatens their livelihood.
Websites lose traffic and subscribers who get everything they need in AI responses without visiting the site, and the unique signature style of an artist or writer can be devalued by the plethora of AI clones. But all this becomes possible only if the AI first learns from the website's articles and author's works. An example is given in the NYT vs. OpenAI case: the plaintiffs claim that ChatGPT directly competes with the newspaper for its role as a source of reliable information, although in fact it uses its articles. The lawsuit cites “billions of dollars in actual damages.”
AI companies need trillions of samples of texts, images and videos to train their generative neural networks. Early models were trained on specially selected licensed data, but such samples are long gone, and much more data is needed. Therefore, today, all available content from the Internet is used for learning needs, including editorials from leading media outlets, artworks from gallery sites, memes from forums and discussions from sites like Reddit. As a result, all leading AI models are partially trained on the content that is protected by copyright and collected in violation of licensing agreements. Large trials are currently underway regarding this issue: Dow Jones vs Perplexity, Andersen vs Stability AI — more than 30 cases in total. The plaintiffs claim that the defendants profit from plaintiffs’ work without references to them or paying them any compensation.
Conscientious AI
AI companies in defense refer to the doctrine of Fair Use, widely used in American legal practice, which in some cases allows the use of copyrighted works without permission, for example, for scientific works or with a significant transformation of the source material. The defendants claim that during training, source materials are repeatedly modified and mixed, that is, transformed, and neither the final model nor the results of its work can indicate the specific copyrighted works that became the primary basis of the generated material.
Lawyers believe that this line of defense can work, but it all depends on the specific AI application. Training a model on Dali's creatures for scientific purposes might be considered fair use, but generating ten new works in Dali's style for a paid exhibition would not. Some contemporary artists have already suffered from such a “stealing of visual identity.” The NYT lawsuit states that some ChatGPT responses almost quote articles from the newspaper, so the degree of content transformation during training is not that great.
AI and authors: a new relationship
Copyright experts agree that the winner is unpredictable. Much depends on whether the plaintiffs prove that it is their style that is imitated by the AI. So far, in one important lawsuit against Stability AI and Midjourney, the plaintiffs have achieved an intermediate success: the court considered the grounds of the claim sufficient to hear the case on the merits and study the evidence. This means that AI companies will be forced to provide internal communications and documentation related to their models, and that the court did not agree with the expansive interpretation of fair use.
A victory for right holders would significantly change the industry. Some experts recall the story of Napster music service, which made music sharing extremely popular at the beginning of the century, but went bankrupt as a result of lawsuits with copyright holders. Similar to this case, experts name current trials a potential killer of current AI models if their use is banned. In case of a more lenient sentence, AI companies will have to pay compensation to right holders, and AI-services will become significantly more expensive due to licensing fees. It was this path that led to the emergence of Spotify after the demise of Napster.
Well, while the trials are going on, more and more sites are blocking their content from being indexed by bots so that it is not used in AI training, and information from the site does not end up in chatbot responses. Side effects are also experienced by ordinary users — many sites do not show content without registration, refuse to work under VPN, and limit the number of views.
Cloudflare suggested to automate blocking of bots that collect training content. The service provider for distributing the load on websites and protecting against DDoS attacks allows its clients to block known bots with literally one button - when an indexing bot from one of the well-known AI providers visits sites, the bot will not receive any information.
However, mass blocking is unlikely to resolve the conflict. Cloudflare developers see a solution in simple and mass licensing of content - not only for giants like the New York Times, but also for small sites. For them, the company is creating something like an exchange, where they can evaluate the cost of their content and negotiate with AI suppliers about its paid use through an automatic platform.