The New York Times has joined the ranks of companies, comedians, authors, music publishers and artists alleging that using their works for training generative AI is copyright infringement. The Times, as in previously filed lawsuits, also claims that the content generated by AI applications can constitute copyright infringement. OpenAI and Microsoft disagree, claiming “fair use,” a legal doctrine that permits the unlicensed use of copyright-protected works in certain circumstances.
Is AI training considered fair use, or do these companies face liability for copyright infringement? While this question will be decided by the courts, we outline arguments for and against finding infringement below.
The New York Times is accusing OpenAI and Microsoft of using millions of the newspaper’s articles without consent or compensation to help train chatbot models to provide information to users.
Generative AI models like ChatGPT, also known as Large Language Models (LLMs), are created to imitate the data they receive. These models generate complete sentences and paragraphs that closely resemble human language, as they undergo “training” to refine and adjust their outputs. The Times’ complaint asserts that the developing and training of LLMS through the utilization of original articles from the Times amounts to copyright infringement.
The Times also claims that OpenAI’s software generated substantial portions of the newspaper’s work, in some cases replicating articles word for word. The Times says that such willful infringement results in the misappropriation of billions of dollars’ worth of work by its journalists. And because Microsoft uses OpenAI’s technology and contributes to OpenAI’s research, the Times contends that Microsoft is also liable for the alleged infringement.
Absent a valid fair use defense, AI companies’ incorporating copyrighted material into LLMs without authorization would be copyright infringement. That’s why the defendants in all these cases, including this one, vigorously maintain that their actions fall under fair use. Section 107 of the Copyright Act directs courts to consider at least four factors when assessing a fair use defense:
In the context of the Times’ case, the likelihood of OpenAI and Microsoft successfully establishing a fair use defense appears low, compared with defendants in prior cases.
When considering the first factor alone – the purpose and character of the use – the defendants’ purpose mirrors that of the Times: delivering news reporting to readers. This use lacks the transformative element generally needed for a fair use argument. Furthermore, the commercial nature of the use is evident, given that AI software is sold for profit.
As to the second factor, although AI treats the input work as factual data, the underlying works remain inherently creative considering the authorial and editorial choices made in presenting factual material by the Times, irrespective of AI’s treatment. The third factor presents a challenge for OpenAI and Microsoft, as it would be difficult for them to assert the use of only a small portion of the Times’ works, especially when the newspaper’s complaint highlights substantial verbatim usage.
When it comes to the fourth factor – the effect on the market for the copyrighted work – the Times makes a strong argument that OpenAI and Microsoft’s use of its content likely has a substantial negative impact on the value of its content. Since much of the Times’ reporting resides behind a paywall, the fact that readers can access the same content through OpenAI without paying for a Times’ subscription affects the value of the copyrighted work. This presents an additional hurdle for the defendants in their fair use defense. It simultaneously presents a stronger claim for the Times against fair use than similarly situated plaintiffs, such as authors, as it’s unlikely anyone will ask an AI app to generate an entire copy of a novel.
If your work is published online, safeguarding it from scraping and subsequent use in AI training is challenging. Once used for AI training, the process is irreversible. A single work cannot be separated from the intricate network of an LLM.
With this understanding, companies and artists alike are seeking ways to be compensated for their work being used by AI. Like the musician Grimes, some writers and creators have accepted that AI will inevitably make use of their work and are exploring ways to license or sell the use of their content and likeness. The Times previously attempted to come to an agreement with OpenAI by seeking “fair value” for licensing use of the newspaper’s content.
The best way you, as a writer, artist, musician or other creator to protect your work is that you should always register copyrights for your work. Copyright registration is a relatively simple and inexpensive process. Although your work is automatically copyrighted when it’s finished in a tangible form, registration ensures that all remedies for infringement, including the ability to sue, are available to you. It also provides notice to others that you own the work.
Journalists and other writers and creators should consult with an attorney experienced in intellectual property law to understand how emerging technologies like AI affect their proprietary rights. Contact us to speak with a member of our team.