AI's role in plagiarism: How can scientists tackle this issue?
Explore how scientists can effectively address the growing issue of AI-induced plagiarism and implement strategies to maintain research integrity.
In recent months, the academic community has been shaken by high-profile plagiarism cases, including the resignation of Harvard’s president and discoveries of copied text in peer reviews. A new challenge has emerged with the rise of generative AI tools, which can create text based on prompts. This development has sparked debate about whether such AI-generated text constitutes plagiarism and under what conditions it should be used.
Jonathan Bailey, a copyright and plagiarism consultant, points out the confusion surrounding AI's role in scholarly writing, noting a spectrum of AI involvement from fully human-written to completely AI-generated content. Generative AI tools like ChatGPT can enhance productivity and clarity but also raise concerns about potential misuse. These tools, which are trained on extensive published works, might inadvertently produce text similar to existing content or be used to mask deliberate plagiarism, complicating the detection of academic dishonesty.
A survey of 1,600 researchers in 2023 revealed that 68% believe AI will both facilitate plagiarism and make it more challenging to identify. Debora Weber-Wulff, a plagiarism expert, notes that the academic community is anxious about the implications of AI and its impact on academic integrity.
AI and plagiarism: Navigating the new frontier
Plagiarism, defined by the US Office of Research Integrity as "the appropriation of another person’s ideas, processes, results, or words without giving appropriate credit," remains a persistent issue. A 2015 study found that 1.7% of scientists admitted to plagiarism, and 30% knew colleagues who had engaged in it.
The advent of large language models (LLMs) like ChatGPT adds complexity to this issue. These AI tools can obscure intentional plagiarism by paraphrasing text in sophisticated manners, such as mimicking the style of academic journals, as noted by Muhammad Abdul-Mageed, a computer scientist and linguist at the University of British Columbia.
A key debate revolves around whether unattributed AI-generated content constitutes plagiarism. Many researchers argue it does not, distinguishing between "unauthorized content generation" and traditional plagiarism. For instance, the European Network for Academic Integrity does not equate the use of AI tools with plagiarism. According to Debora Weber-Wulff, plagiarism implies attribution to an identifiable person, and AI-generated text, even if similar to existing content, often lacks the direct attribution required for plagiarism.
However, some argue that generative AI tools might infringe on copyrights. Rada Mihalcea, a computer scientist at the University of Michigan, highlights that these systems are built on the work of countless individuals. The debate intensified in December 2023 when The New York Times filed a copyright lawsuit against Microsoft and OpenAI, alleging that their LLM, GPT-4, used millions of the newspaper's articles without permission for training purposes. The lawsuit cites instances where GPT-4 generated text closely resembling The New York Times articles.
In response, OpenAI has sought to dismiss parts of the lawsuit, asserting that ChatGPT is not a replacement for subscription-based content, and Microsoft emphasizes that AI tools should advance responsibly without undermining journalism. If the court finds that training AI on text without permission constitutes copyright infringement, it could significantly impact AI companies and their training methodologies.
The surge of AI in academic writing
Since the release of ChatGPT in November 2022, the use of AI in academic writing has surged. Research updated in July 2024 estimated that around 10% of abstracts in biomedical papers from the first half of 2024 involved LLMs for writing—equivalent to 150,000 papers annually. Dmitry Kobak and his team at the University of Tübingen analyzed 14 million abstracts from PubMed, published between 2010 and June 2024. Their findings indicated a notable rise in the use of stylistic words like "delves," "showcasing," and "underscores," which were linked to AI-generated content. This shift underscores the profound impact LLMs are having on scientific literature.