Mirror, Mirror on the Wall, What Is the Fairest Use of Them All?

Apr 13

It is a quintessential lesson nearly everyone learns: Taking without permission is stealing. In the United States, that principle extends to copyrightable works. Since 1976, the U.S. Copyright Act has established the framework for protecting “fixed works.” The emergence of Artificial Intelligence (AI) and Large Language Models (LLMs) has challenged that framework, reshaping how courts evaluate what constitutes fair use under the Act. Two recent district court cases illustrate this—Bartz v. Anthropic[1]and In re Mosaic LLM.[2] In examining the limits of the fair use defense, Bartz focused on the digitization of print books used to train AI models, while Mosaic examined the large-scale ingestion of online text datasets.

The Slowest Moving Vehicle: Print Publishing

Copyright is the legal protection granted to authors for original works of authorship fixed in a tangible medium; it grants authors exclusive rights to reproduce, distribute, and adapt their creations.[3] Copyright law is, therefore, the cornerstone of the publishing industry—no law matters more for protecting authors and their published works.

Publishing a book can take years before it ever reaches a reader’s hands—from writing, editing, and acquiring an agent, to the lengthy production process. On average, the journey from idea to print spans roughly two years, though it can vary widely. This investment of time and creative effort underscores why copyright protection is so vital.[4] A novel can represent years of an author’s labor, refinement, and personal sacrifice.[5]

The publication process is extensive. An author must first find an agent,[6] often through months of submissions and waiting. Once represented, the author and agent edit the manuscript, before pitching it to editors at publishing houses. When an editor acquires the book, it goes through multiple rounds of revision under the supervision of the editorial team. The production, design, sales, subsidiary-rights, and marketing departments each take over at their respective stages—from preparing galleys to coordinating distribution.[7]

This intricate process demonstrates the immense labor behind every printed work. Because of that effort, copyright law safeguards authors’ creations but, as with most rules, there are exceptions. One of the most significant exceptions is fair use; a doctrine that determines when using another’s work, without permission, may nonetheless be legally permissible.[8]

The Fairest Use of Them All

Under U.S. copyright law, fair use functions as a defense to a claim of infringement, allowing limited use of copyrighted material, without the author’s permission when justified.[9]

For a use to qualify as fair, four considerations are outlined in section 107 of the Copyright Act: (1) the purpose and character of the use, including whether it is educational or commercial; (2) the nature of the copyrighted work; (3) the amount and substantiality of the portion used; and, (4) the effect of the use upon the market for or value of the original work.[10]

Although the four-factor analysis has remained consistent, its application to LLMs has evolved. Take, for example, Bartz v. Anthropic.[11] The fair-use doctrine, according to the Supreme Court, “permits courts to avoid rigid application of the copyright statute when, on occasion, it would stifle the creativity which that law is designed to foster.”[12] This flexibility allows courts to balance the protection of creativity with the freedom to innovate.

In Bartz, Judge Alsup ruled in Anthropic’s favor, holding that the company’s use of copyrighted books to train its AI models—including Claude and its predecessors—qualified as fair use, unless the books were pirated from websites such as Library Genesis.[13] This case stands out among AI-related disputes, because Claude does not reproduce portions of its training inputs. Judge Alsup explained that using lawfully acquired books for internal training purposes constituted fair use under section 107 of the Copyright Act.[14]

A companion case, In re Mosaic LLM, expanded upon Bartz, addressing whether the automated collection of publicly available digital materials for model training constitutes a transformative use.[15] The court’s preliminary findings suggest that, although In re Mosaic implicates a broader range of source material, the same analytical framework under section 107 applies.[16] The reasoning in Bartz and the early analysis in Mosaic both reaffirm that properly acquired materials used internally for machine training fall within the fair-use defense.

Courts have long emphasized that transformative use plays a pivotal role in fair use.[17] Yet, determining what counts as “transformative” remains contentious—especially in the digital age. The Second Circuit, for example, held that Google’s book-scanning project was transformative, because it made books searchable without offering full-text access, thus serving a different purpose than the originals.[18] The Ninth Circuit echoed this reasoning in Kelly v. Arriba Soft[19] and Perfect 10 v. Amazon,[20] finding that image search engines transformed the use of copyrighted images by improving access to information.

These precedents have informed modern AI disputes. In Bartz, the court applied the transformative-use test analogously, suggesting that training AI models on text data—without reproducing it verbatim—can similarly be transformative.[21]In re Mosaic LLM extended this reasoning, noting that large-scale ingestion of public data could also qualify, if the output serves a distinct purpose unrelated to the original works.[22]

The Market Share

The primary dispute in Bartz turned on the transformative-use factor—whether Anthropic’s application of the books sufficiently altered their purpose; but the court’s analysis left unresolved how this flexibility affects the market-impact factor. Permitting tech companies to use purchased books for LLM training could weaken the market for publishers and authors to license their works. Whether Judge Alsup accurately balanced transformation against market harm remains to be seen, especially as other AI-copyright cases approach litigation.[23]

Judge Alsup acknowledged that, while using properly acquired books currently falls under fair use, future developments could shift this understanding as authors assert that such uses should belong exclusively to them.[24] Similarly, In re Mosaic LLM raises concerns about market substitution in the context of mass-scraped online texts, where authors and publishers may lose the ability to license their works for AI training.[25]

In anticipation, many publishers are proactively licensing books to technology companies for AI training. This allows publishers to regulate how their works are used and to negotiate royalties for authors. With widespread budget cuts across education sectors, licensing has also become a practical way for publishers to create alternative revenue streams. For instance, since Bartz, the scholarly publisher Wiley has entered agreements with Anthropic to license its content for AI training.[26]Publishers Marketplace has reported that Bloomsbury, Johns Hopkins University Press, and other houses are following suit.[27] Other coverage in Publishers Marketplace details how additional imprints—including Macmillan, Hachette, and Chronicle Books—have considered similar arrangements.[28]

Such licensing deals underscore the tension between accessibility and control. On one hand, they provide authors with compensation and transparency; on the other, they could create monopolistic licensing markets dominated by major AI developers.[29] Therefore, courts must continue balancing the need for innovation against the danger of consolidating creative control in corporate hands.[30]

The conversation over market-impact is not limited to literature. In Google v. Oracle, the Supreme Court weighed whether Google’s reimplementation of Java APIs for Android constituted fair use.[31] The Court held that it did, emphasizing that transformative use can coexist with commercial gain when it expands access to creative expression rather than restricting it.[32] This reasoning resonates in the AI context, where training-data use may promote technological progress without usurping the original market for expressive works.[33]

Meanwhile, the Warhol Foundation v. Goldsmith decision reaffirms that even a seemingly transformative work may fail the market-impact test if it directly substitutes for the original or competes in the same commercial sphere.[34] For book publishers, this means AI systems that generate summaries or stylistic imitations could infringe, if they diminish demand for the originals.[35]

The Future of Fair Use for Books in LLM

The future of fair use in the age of LLMs remains uncertain. As Judge Alsup observed, publishing is an evolving market, and the balance between innovation and author rights will continue to shift. Ongoing lawsuits involving Apple, and publishers partnering with AI companies, such as Bloomsbury, Wiley, and Johns Hopkins University Press,[36] signal that copyright holders are increasingly challenging whether AI training truly qualifies as transformative use.[37]

Together with In re Mosaic LLM, which continues to test how far courts will stretch the doctrine to accommodate emerging technologies, these cases will shape the next generation of fair-use jurisprudence.[38]Mosaic’s implications extend beyond text: Its reasoning could influence future disputes over musical, visual, and audiovisual data used in machine-learning training.[39]

If courts begin to prioritize market-impact over transformation, the fair-use defense may contract, narrowing the legal space for AI developers to rely on existing works.[40] As Bartz v. Anthropic moves toward its expected resolution in 2026, more authors and publishers are likely to assert their rights outside the fair-use doctrine—ensuring that the value of human creativity remains protected even in a machine-driven world.[41]

The discussion has already expanded into broader copyright-reform debates.[42] Legislators and scholars are questioning whether existing statutes sufficiently address algorithmic learning.[43] Some have proposed a compulsory-licensing framework for training data,[44] echoing the models long used in broadcast and musical reproduction.[45] Others advocate a “fair-compensation” scheme[46] that would allow LLM developers to train on creative works while paying a collective royalty—bridging innovation and authorship.[47]

Ultimately, the intersection between publishing and AI presents both a challenge and an opportunity. While fair use provides a framework for balancing creativity with innovation, its contours are being tested in real time. As courts continue to navigate these issues, the question is no longer what constitutes fair use, but how far society is willing to stretch it in the name of technological progress.[48]

[1]Bartz v. Anthropic, No. C 24-05417 WHA, 2025 WL 1741691 (N.D. Cal. June 23, 2025)

[2]In re Mosaic LLM Litig., No. 24-CV-01451-CRB (LJC), 2025 WL 2294910

[3] 17 U.S.C. § 102(a).

[4]Id.

[5] Steve Laube, “How Long Does It Take To Get Published?” (June 2019), https://stevelaube.com/how-long-does-it-take-to-get-published/

[6] Publishers Association, “How Publishing Works” (2020), https://www.publishers.org.uk/about-publishing/how-publishing-works/

[7]Id.

[8] 17 U.S.C. § 107.

[9]Campbell v. Acuff-Rose Music, Inc., 510 U.S. 569 (1994).

[10]Id. at 577.

[11]Bartz v. Anthropic, No. 23-CV-04177 (N.D. Cal. 2024).

[12]Campbell, 510 U.S. at 577.

[13]Id.

[14]Id. at 584.

[15]In re Mosaic LLM Litig., No. 24-CV-01451-CRB (LJC), 2025 WL 2294910, at *3 (N.D. Cal. Aug. 8, 2025)

[16]Id.

[17]Campbell, 510 U.S. at 584–86.

[18]Authors Guild v. Google, Inc., 804 F.3d 202 (2d Cir. 2015).

[19]Kelly v. Arriba Soft Corp., 336 F.3d 811 (9th Cir. 2003).

[20]Perfect 10, Inc. v. Amazon.com, Inc., 508 F.3d 1146 (9th Cir. 2007).

[21]Bartz, slip op. at 14.

[22]In re Mosaic LLM Litig., 2025 WL 2294910, at 5 (N.D. Cal. Aug. 8, 2025).

[23]Bartz v. Anthropic, No. 23-CV-04177 (N.D. Cal. 2024).

[24]Id.

[25]In re Mosaic LLM, 2025 WL 2294910,at 5.

[26]Publishers Marketplace, “Wiley Partners with Anthropic on AI Integration for Scholarly Research,” (July 2025), https://lunch.publishersmarketplace.com/2025/07/wiley-partners-with-anthropic-on-ai-integration-for-scholarly-research/.

[27]Publishers Marketplace, “Bloomsbury Explores AI Licensing,” (Aug. 2025), https://lunch.publishersmarketplace.com/2025/08/bloomsbury-explores-ai-licensing/.

[28]Publishers Marketplace, “Johns Hopkins Press Will License Its Books to Train AI,” (July 2025), https://lunch.publishersmarketplace.com/2025/07/johns-hopkins-press-will-license-its-books-to-train-ai/.

[29]See also Publishers Marketplace, “AI and the Future of Book Licensing,” (Sept. 2025), https://lunch.publishersmarketplace.com/2025/09/ai-and-the-future-of-book-licensing/.

[30]Id.

[31]Google LLC v. Oracle Am., Inc., 593 U.S. 1, 141 S. Ct. 1183, 209 L. Ed. 2d 311 (2021)

[32]Id. at 522.

[33]Id.

[34]Andy Warhol Found. for the Visual Arts, Inc. v. Goldsmith, 598 U.S. 508 (2023).

[35]Id. at 534.

[36]See generally Publishers Marketplace, “Wiley Partners with Anthropic on AI Integration for Scholarly Research,” (July 2025); “Bloomsbury Explores AI Licensing,” (Aug. 2025); “Johns Hopkins Press Will License Its Books to Train AI,” (July 2025); “AI and the Future of Book Licensing,” (Sept. 2025).

[37]In re Mosaic LLM, 2025 WL 2294910 (N.D. Cal. 2024).

[38]Id.

[39]Id.

[40]Bartz, No. 23-CV-04177 (N.D. Cal. 2024).

[41]Id.

[42]In re Mosaic LLM, 2025 WL 2294910 (N.D. Cal. 2024).

[43]See generally 17 U.S.C. §§ 101–122.

[44]See Campbell, 510 U.S. at 590.

[45]See also H.R. Rep. No. 94-1476 (1976).

[46]Id.

[47]See generally Jessica Litman, Real Copyright Reform, 96 Iowa L. Rev. 1 (2010). See also In re Mosaic LLM, 2025 WL 2294910 at 3. See Campbell, 510 U.S. at 591.

[48]Campbell, 510 U.S.at 594.

Sabrina Josephson

Mirror, Mirror on the Wall, What Is the Fairest Use of Them All?

Copyright Post-Warhol v. Goldstein: Trends in Music

TikTok: The Countdown To A U.S. Owned App