Sarah Silverman has joined the ranks of authors and artists alleging that AI training with their work is copyright infringement. Generative AI makers like OpenAI disagree, claiming fair use. Is AI training fair use, or are these companies liable for infringement? This is an open question for the courts but this post outlines the creatives’ claims and defenses to them.
Comedian Sarah Silverman and other authors claim that ChatGPT and LLaMA (Meta’s AI generator) were “trained” on their copyrighted books without consent or compensation. Generative AI models like ChatGPT known as Large Language Models (LLMs) are designed to mimic sets of data that they are fed. LLMs produce full sentences and paragraphs that are similar to human language because they are trained to continuously adjust their outputs to resemble sequences of words copied from a training dataset. Silverman’s complaint alleges that LLMs are committing infringement by feeding copies of her works into the AI application for such “training.”
The LLMs are trained using books because they are a great source of long-form, high-quality written language. Silverman’s lawsuit alleges that OpenAI, which generally refuses to reveal its training datasets, has scraped databases of torrented books to train its LLMs. The authors allege that these companies have copied their content without permission or compensation. This, they maintain, is theft.
The unauthorized ingestion of copyrighted material into the LLMs by the AI companies likely constitutes copyright infringement. However, these companies may argue that their conduct is fair use. Fair use is a defense to copyright infringement. Section 107 of the Copyright Act directs courts to consider at least four factors when evaluating a fair use defense:
The AI companies have plausible, but rebuttable arguments on each of these factors. First, although the companies are using the work in a transformative way by ingesting it as a series of data points to inform unrelated output, the use is still commercial since AI products are sold for profit. Second, AI treats the input work as factual bits of data, but many of these works are creative in nature regardless of AI’s treatment of it. Third, it will be hard for AI companies to argue they have used only a small portion of each work, as AI can often generate book summaries or accurate writings in the style of a particular author, which would require the digestion of an entire body of work. Fourth, since AI rarely reproduces a work exactly, it is unlikely that AI outputs compete directly with a copyrighted work, but creators may argue that AI is a substitute for their creative efforts in markets in which their copyrighted work is sold.
If your work has already been used to train AI, it cannot be undone. It is impossible to disentangle a single work from the neural network of an LLM. Furthermore, if your work is available online in any form, it is likely hard to protect it from being scraped and used for AI training. With this understanding, some artists are seeking ways to be compensated for their work being used by AI. From musician Grimes to the New York Times, some writers and creators have accepted that AI will inevitably make use of their work and are exploring ways to license or sell the use of their content and likeness.
Regardless of the results of the lawsuits: always register copyrights for your work. Registration is a relatively simple and inexpensive process. Although your work is automatically copyrighted when it’s finished in a tangible form, registration ensures that all remedies for infringement are available to you. It also provides notice to others that you own the work.
Creators, including comedians, should consult with an attorney experienced intellectual property law and comedy law to understand how emerging technologies like AI affect their proprietary rights. Contact us to speak with a member of our team.
Contributions to this blog by Gabriella Epley.