facebook pixel
Published Jun 4, 2025 ⦁ 14 min read
Fair Use vs. Copyright in AI Searches

Fair Use vs. Copyright in AI Searches

Can AI academic tools legally use copyrighted materials? This question lies at the heart of a growing debate between fair use and copyright law. AI tools like Sourcely are changing how researchers find and cite academic sources, but their reliance on copyrighted content for training raises legal concerns. Here's the key takeaway:

  • Fair Use: Allows limited use of copyrighted materials for purposes like education and research. Courts assess factors like purpose, nature, amount used, and market impact to determine legality.
  • Copyright Issues: AI tools often process entire works, which may infringe copyright laws, especially in commercial contexts.
  • Key Case: Thomson Reuters v. ROSS Intelligence ruled that using copyrighted material to train AI can violate copyright law, challenging fair use claims.
  • Sourcely’s Approach: Partners with licensed databases, avoids reproducing full texts, and focuses on citation and source discovery to stay within legal boundaries.

The balance between innovation and protecting intellectual property is shaping the future of AI in academic research.

Thomson Reuters

Copyright laws play a critical role in shaping how AI academic search tools function. These regulations dictate the legal boundaries for processing, analyzing, and distributing scholarly content while respecting the rights of creators. The challenge lies in striking a balance between advancing technology and safeguarding intellectual property, especially when AI systems generate copies of copyrighted works during their training and operation.

AI systems often create exact replicas of copyrighted works during training, raising legal questions about whether outputs that closely resemble their inputs violate copyright law. A May 2025 report from the U.S. Copyright Office highlighted this issue, emphasizing the importance of determining whether AI models infringe copyright when their outputs are strikingly similar to the material they were trained on.

Professor Lynda Oswald from the University of Michigan aptly describes the scope of this issue:

"One of the biggest legal challenges of the mid-21st century will be figuring out how to regulate AI effectively".

The stakes are high, given the massive financial investments in AI. In 2024, venture capital poured $64.1 billion into AI startups, while tech giants like Amazon, Microsoft, Alphabet, and Meta collectively spent $52.9 billion on capital expenditures in just one quarter.

To address these concerns, AI academic tools employ safeguards to prevent generating content that mirrors copyrighted material. These measures not only reduce the risk of infringement but also influence how fair use is evaluated, taking into account the context and purpose of the use.

Commercial Use and Licensing Requirements

For commercial AI tools, the stakes are even higher. Licensing agreements are essential to ensure compliance with copyright laws, as publishers grow increasingly wary of their content being used to develop competing products.

These agreements often include clauses that restrict derivative works and prohibit sharing licensed content with third parties. As noted by Rachael G. Samberg, Timothy Vollmer, and Samantha Teremi from UC Berkeley Library's Office of Scholarly Communication Services:

"Within contracts applying U.S. law, more specific language controls over general language in a contract. So, even if there is a clause in a license agreement that preserves fair use, if it is later followed by a TDM clause that restricts how TDM can be conducted (and whether AI can be used), then that more specific language governs TDM and AI usage under the agreement".

For academic institutions, these licensing terms have significant implications. The University of California, which accounts for over 8% of scholarly publishing and 9% of academic research and development in the U.S., spends more than $60 million annually on licensing electronic content. Samberg, Vollmer, and Teremi caution:

"Newly-emerging content license agreements that prohibit usage of AI entirely, or charge exorbitant fees for it as a separately-licensed right, will be devastating for scientific research and the advancement of knowledge".

These licensing challenges directly influence how platforms like Sourcely operate.

Sourcely

Sourcely takes a proactive approach to copyright compliance, implementing strategies that align with legal requirements. Instead of scraping copyrighted content, the platform partners with trusted academic databases and repositories that provide properly licensed materials. This ensures that all content accessed and processed is within the bounds of copyright law.

The platform employs advanced filtering to prevent the reproduction of substantial portions of copyrighted works. Summaries and excerpts generated by Sourcely are designed to be transformative, meaning they add new value or perspective rather than simply substituting the original content.

Additionally, Sourcely incorporates measures to block infringing outputs, adhering to guidance from the Copyright Office on reducing copyright risks.

To support its compliance efforts, Sourcely offers tiered subscription plans. These include a $7 trial for 2,000 characters, monthly plans at $17, annual subscriptions for $167, and a one-time $347 lifetime plan. By focusing on citation assistance and source discovery instead of full-text reproduction, Sourcely aims to complement the market for original academic content rather than compete with it. This approach not only ensures legal compliance but also supports the broader ecosystem of scholarly research.

Fair Use Rules in AI Academic Applications

Fair use is a legal principle that allows limited use of copyrighted materials for purposes like research and education, aiming to balance creators' rights with the needs of innovation. This principle supports activities such as teaching, research, and scholarship - areas where AI academic tools like Sourcely primarily operate. However, fair use isn’t automatic; it requires a detailed analysis of both how the technology functions and the nature of its outputs.

Key Fair Use Factors for AI Tools

Courts evaluate fair use based on four main factors:

  • Purpose and character of use: This factor considers whether the AI tool transforms the original material or simply reproduces it. According to the Copyright Office, transformativeness is crucial, especially when AI models are trained for research purposes or within closed systems. Tools designed for educational purposes, like academic AI platforms, often align with fair use goals.
  • Nature of the copyrighted work: This examines the type of content being used. Factual and educational works, such as academic papers, typically receive less protection compared to highly creative works.
  • Amount and substantiality used: This looks at how much of the original work is utilized. The Copyright Office acknowledges that "machine learning processes often require ingestion of entire works" but emphasizes that fair use is more likely when the model's outputs are non-expressive and don’t closely mirror the original content.
  • Market impact: Courts analyze whether the AI tool negatively affects the market for the original work or undermines its value.

Advocates for this framework, including the Library Copyright Alliance, argue that using publicly available internet materials to train AI models should qualify as fair use. They see it as essential for supporting research methods like text and data mining.

Transformative Use in Academic AI Tools

Transformative use is a cornerstone of fair use, especially for AI tools. It occurs when the tool adds new meaning, purpose, or context to the original work, rather than simply replicating it. For instance, an AI tool that scans academic papers to extract relevant sources and create summaries transforms raw text into something analytical and useful.

The distinction between transformative and derivative use is critical for staying within legal boundaries. Some developers argue that training language models is akin to creating advanced indexing tools, similar to how a human researcher reviews multiple sources to generate insights. On the other hand, some copyright holders contend that generative AI outputs can directly compete with original works. This ongoing debate has prompted the Copyright Office to caution that "using copyright-protected materials for AI model training alone does not justify fair use". They also emphasize that AI outputs must not "closely track the creative intent of the input".

How Sourcely Aligns with Fair Use Principles

To meet fair use standards, AI academic tools must prioritize transformation over replication. Sourcely is specifically designed to do just that, focusing on assisting researchers rather than replacing original content. Its primary function - helping users find relevant sources and determine citation placements - represents a clear transformative use of academic materials.

Sourcely strengthens its fair use standing by producing controlled outputs, such as concise summaries and filtered search results, instead of replicating large sections of original texts. These outputs add value by highlighting the relevance of research topics, complementing the original work rather than competing with it.

The platform’s subscription model reinforces its role as a research aid. By charging for features like enhanced search capabilities rather than offering full-text access, Sourcely derives its value from its organizational and analytical tools. This approach ensures it supports, rather than substitutes, the original academic content.

Additionally, Sourcely actively promotes the academic publishing ecosystem. By guiding users to authoritative sources, it encourages access to original papers, potentially increasing demand for these works. Features like exporting references in various formats and offering advanced search filters further position Sourcely as a transformative tool that enhances research efficiency without undermining the value of the original materials.

Copyright laws protect original works, while fair use allows limited exceptions for purposes like education and research. Understanding how these rules apply to AI academic tools is crucial for navigating the legal terrain these platforms operate within. These principles also lay the foundation for evaluating the different revenue models and usage approaches these tools adopt.

Commercial vs. Non-Commercial Models

Whether an AI tool is commercial or non-commercial significantly influences its eligibility for fair use. Tools used for nonprofit educational purposes generally align more closely with fair use guidelines, while commercial tools face stricter scrutiny.

A notable example of this distinction comes from the February 2025 case Thomson Reuters Enterprise Centre GmbH v. Ross Intelligence Inc.. In this case, the U.S. District Court for the District of Delaware ruled that Ross Intelligence's use of Thomson Reuters' copyrighted headnotes to train its AI-driven legal research tool did not qualify as fair use . The court emphasized that Ross's commercial intent, aimed at creating a competing product, negatively impacted the market for Thomson Reuters' original work.

For platforms like Sourcely, revenue models play a pivotal role. Sourcely, for instance, charges users $17 per month or $167 annually for advanced research tools, focusing on features like organization and analysis rather than providing direct access to copyrighted content. This approach avoids directly reproducing academic works, emphasizing functionality over replication. However, commercial AI tools like Sourcely may still need to justify their fair use claims or establish licensing agreements with content creators.

Transformative vs. Derivative Outputs

A key aspect of fair use analysis is distinguishing between transformative and derivative outputs. Transformative use introduces new meaning, expression, or purpose to the original work, while derivative outputs merely replicate or minimally alter existing content .

  • Transformative use: Adds new insights or serves a distinctly different purpose.
  • Derivative outputs: Simply repackage or slightly modify existing material.

The Copyright Office has clarified that using copyrighted works solely to train AI models may not satisfy fair use requirements. Academic tools like Sourcely, which assist with tasks such as generating research summaries or suggesting citations, often lean toward being transformative. However, as Judge Pierre Leval noted:

"The existence of any identifiable transformative objective does not, however, guarantee success in claiming fair use. The transformative justification must overcome factors favoring the copyright owner".

This distinction is crucial when evaluating the potential market effects of AI-generated outputs.

Market Impact and Competition Issues

Market impact is another critical factor in determining fair use . AI tools that directly compete with content creators' markets are more likely to face legal challenges. The Copyright Office has raised concerns about AI-generated outputs potentially replacing or undermining the market for copyrighted works.

Sourcely addresses these concerns by complementing rather than competing with academic publishers. Instead of replacing access to original research, Sourcely helps users discover authoritative sources and encourages proper citation practices. Features like exportable references and advanced search filters position the platform as a support tool for researchers, enhancing their ability to engage with original academic content. By guiding users toward credible sources and reinforcing proper attribution, Sourcely minimizes potential market disruptions and aligns its practices with fair use principles.

As legal frameworks evolve, it’s clear that AI tools must demonstrate their value to the academic community while respecting the rights of content creators. Striking this balance is essential for promoting progress while ensuring fair compensation for original works.

sbb-itb-f7d34da

Balancing fair use with copyright protection is becoming increasingly critical as legal frameworks evolve. The shifting landscape of academic search compliance highlights the importance of adopting strategies that can keep pace with these changes.

The Copyright Office is currently refining reports on AI-generated works and training practices, following feedback from over 10,000 stakeholders. By February 2024, it had already issued registrations for "well over 100" AI-assisted works, signaling a growing recognition of human-AI collaboration in creative endeavors.

Court rulings are also shaping the way academic AI tools operate. A notable case, Thomson Reuters Enterprise Centre GMBH v. ROSS Intelligence Inc. (February 2025), emphasized the scrutiny commercial AI tools face when they compete with original content providers. Judge Stephanos Bibas remarked:

"Only non-generative AI is before me today", indicating that future cases involving generative AI tools could be judged differently, especially when it comes to using copyrighted material for training purposes - a practice that may be considered more transformative.

This distinction between generative and non-generative AI tools could benefit academic search platforms focusing on discovery and organization rather than content creation.

Legislation is also evolving. The proposed Generative AI Copyright Disclosure Act seeks to require AI developers to disclose their content sources, while the COPIED Act of 2024 emphasizes transparency and content provenance. Meanwhile, the European Union has already enacted the AI Act, setting a precedent for global regulatory frameworks.

Other legal cases, such as Concord Music Group, Inc. v. Anthropic PBC, which involves the use of copyrighted song lyrics for training the AI assistant "Claude", may establish critical precedents for creating intermediate copies of copyrighted materials. Similarly, Allen v. Perlmutter is testing the boundaries of copyright protection for AI-generated works with substantial human involvement.

These developments call for a compliance strategy that is both proactive and adaptable, as outlined below.

Sourcely's Dual Compliance Method

To navigate these legal and regulatory shifts, Sourcely has adopted a dual compliance strategy that combines proactive copyright safeguards with strong fair use defenses. This approach ensures the platform remains aligned with its mission of academic support while adapting to new legal requirements.

Sourcely's transparency-first approach is key to meeting emerging disclosure mandates. By documenting data sources and processing activities, the platform naturally aligns with fair use principles. Unlike generative AI tools, Sourcely emphasizes source discovery and citation assistance, encouraging proper attribution and directing users to original works rather than replacing them.

The platform's tiered pricing model, offering free basic access alongside premium subscriptions, ensures it remains accessible to educational users while supporting sustainable operations.

To address disclosure and governance standards, Sourcely has enhanced its filtering capabilities. These safeguards help identify and appropriately manage copyrighted material, ensuring that its outputs complement - rather than compete with - original academic sources. Features like citation assistance and reference export reinforce its commitment to proper attribution, strengthening its fair use position.

Sourcely has also implemented governance frameworks to track data provenance and validate outputs. This infrastructure allows for quick adaptation to regulatory changes, whether they involve stricter disclosure requirements or shifts in fair use standards. By focusing on transformative functionality - helping researchers organize, discover, and cite sources effectively - Sourcely positions itself favorably under legal interpretations that prioritize transformative uses over derivative ones.

As the U.S. Copyright Office recently noted:

"Transformativeness is a matter of degree; the extent to which something is transformative ultimately depends on the functionality of the model and how it is deployed".

Navigating the legal intricacies of AI academic tools involves finding a middle ground between honoring copyright laws and embracing fair use principles. The U.S. Copyright Office highlights the importance of this balance:

"The public interest requires striking an effective balance, allowing technological innovation to flourish while maintaining a thriving creative community".

This balance hinges on adopting strategies that protect intellectual property while fostering innovation. Academic AI tools that emphasize transparency and transformative functionality are better equipped to meet these demands. Tools focusing on source discovery, citation support, and research organization - rather than outright content generation - align more closely with fair use guidelines. By clearly documenting data sources and processing methods, these platforms not only ensure copyright compliance but also support researchers in accessing relevant materials while maintaining academic integrity.

For researchers, selecting tools that adhere to fair use principles and copyright standards offers notable benefits. These tools expand access to resources without legal complications, build trust within academic circles, and encourage sustainable innovation.

Sourcely provides a compelling example of this approach. By prioritizing source discovery over content creation, maintaining transparent data handling, and promoting proper attribution, Sourcely demonstrates how AI tools can respect content creators' rights while serving researchers effectively. Its model of offering free educational access alongside premium options illustrates how copyright compliance and fair use can work together rather than in opposition.

As legal frameworks continue to evolve, the most effective AI academic tools will be those that adapt proactively, ensuring they meet new requirements without straying from their primary mission. The future belongs to platforms that create a sustainable environment for academic progress while maintaining respect for intellectual property and fairness.

FAQs

The concept of 'transformative use' involves taking existing content and reworking it to create something with a new purpose, meaning, or character - essentially making it distinct from the original. In the realm of fair use, courts often assess whether an AI tool's use of copyrighted material meets this standard of transformation. For instance, AI tools that summarize or analyze academic works to provide fresh insights or support research might claim their use qualifies as transformative.

That said, just because a use is transformative doesn’t mean it automatically falls under fair use. Courts also weigh whether the AI tool’s activities harm the original creator’s market or potential licensing opportunities. Striking the right balance between transformation and the impact on the original work’s market is a key factor in deciding whether an AI tool’s use aligns with fair use guidelines.

AI-powered academic tools can strike a balance between respecting copyright laws and supporting effective research by taking deliberate steps to safeguard intellectual property. One essential step is securing proper licenses for any copyrighted materials used in training their models. This reduces the risk of legal complications while ensuring ethical use of content.

Another critical measure is implementing checks to review AI-generated outputs. These safeguards help prevent unintentional copyright violations, ensuring that the content produced aligns with legal standards.

Moreover, institutions and developers can establish clear guidelines for copyright compliance. By offering training to users about their rights and responsibilities, they can encourage responsible use of these tools. This proactive approach not only ensures adherence to copyright laws but also helps build a research environment grounded in trust and ethical practices.

Shifts in copyright laws and fair use policies are poised to have a major impact on how AI tools are utilized in academic research. As these regulations evolve, there's an increasing acknowledgment of fair use as a cornerstone for advancing research and education. This could open doors for researchers to use AI tools more freely, without stepping into copyright infringement territory.

Meanwhile, emerging rules - like those mandating the disclosure of AI-generated content - are likely to influence how these tools are designed and applied. Such regulations aim to establish clearer boundaries for the ethical and legal use of AI in academic contexts, helping researchers stay compliant while encouraging progress in the field.

Related posts

Join Sourcely weekly newsletters

Background Image

Ready to get started?

Start today and explore all features with up to 300 characters included. No commitment needed — experience the full potential risk-free!

Check out our other products

yomu ai logo

Don't stress about deadlines. Write better with Yomu and simplify your academic life.

arrow icon
Go home

Welcome to Sourcely! Our AI-powered source finding tool is built by students for students, and this approach allows us to create a tool that truly understands the needs of the academic community. Our student perspective also enables us to stay up-to-date with the latest research and trends, and our collaborative approach ensures that our tool is continually improving and evolving.

LinkedinXTikTokEmail

© 2025 Sourcely