The use of artificial intelligence (AI) tools in academic peer review processes offers potential benefits like reducing bias, improving efficiency, and augmenting human analysis. However, integrating these emerging technologies also raises valid concerns around confidentiality and academic integrity that warrant careful consideration.
In a recent discussion as part of COPE’s Publication Integrity Week, Hum’s co-founder and President, Dustin Smith and Dr. Mohammed Hosseini of Northwestern University explored the confidentiality implications of uploading scholarly content to AI systems.
A core concern of many publishing professionals is that text, data, or other materials submitted by authors to AI tools may then be utilized to train machine learning models without explicit consent.
As Tristan Harris, a former Google design ethicist once said, “If you're not paying for the product, you are the product.”
When you use a public tool like ChatGPT, you’re a part of the training dataset unless you explicitly ask to be opted out. By submitting manuscripts or other documents to AI peer review systems, authors do risk relinquishing some level of control or copyright on that content, as portions may be extracted or reused without transparency.
But if you’re looking at prompt and response, OpenAI isn’t really extracting scholarly articles out of these datasets. In fact, as Dustin pointed out, if you have a subscription to Claude or Perplexity, or you’re running a model locally, there’s not a huge distinction between using a local LLAMA model or a word processor. These are tools in your arsenal, and you’re responsible for the output.
Dr. Hosseini compares this practice to that of online companies collecting user data through cookies, search logs, and other means. It took substantial time for broad awareness, expectations, and regulations around the collection of online data to emerge. Standards and policies will need time to evolve in response to these technological changes.
Both speakers also highlighted potential risks to academic integrity from AI systems learning and later reproducing unique phrases, passages, arguments, or ideas from manuscripts used in training. While any single paper is statistically unlikely to be precisely memorized given the enormous scale of training data, concepts or language that are used repeatedly across multiple inputs stand a higher chance of being internalized by models.
While AI systems like those used in peer review hold promise to augment productivity, cautions around privacy and ethics are certainly warranted. As with any new technology permeating academia, proactive engagement is needed from all stakeholders - researchers, publishers, institutions, and more. With informed use and governance, the scholarly community can hopefully harness AI in a way that maximizes benefits while minimizing risks.
Catch the full session below: