Thoughts on Using AI/LLMs for Peer Review?

I’m putting this out there because it came up while I was preparing a seminar for a course I’m teaching this semester on critical thinking skills.

While working on this, I couldn’t help noticing that over the past year, I’ve been seeing more and more reviews that feel AI-assisted, stylistically, linguistically, even structurally. No passive aggressive feedback form Reviewer 2, no grammatical mistakes in the reports, ultra-polished sentences, generic “strengths/limitations” phrasing. And it’s happening often enough that it’s hard to ignore.

Which raises the broader question: What do we actually think about using LLMs to peer review scientific papers? Beyond the obvious “please don’t outsource your scientific judgment to a chatbot,” there are real issues here, especially confidentiality. Peer review is supposed to be confidential process, and LLMs are anything but. By doing so, reviewers are basically exposing people’s un-published work. In my view, AI should be seen as a tool to facilitate your work, and not a tool to do it for you.

So I’m genuinely curious:

  • Is there a responsible, ethical way to use LLMs during peer review, or is it a line we shouldn’t cross?

  • Should journals start spelling out explicit policies?

  • And with AI-flavored reviews becoming more common, is this something we should be worried about—or just the new normal?

Would love to hear others’ experiences and perspectives on where this is heading.

2 Likes

Hi Amgad, that’s a great question! And I found myself thinking about this last week, as I was reviewing a manuscript.

I think that all tools are here to stay and can help us greatly, reason why acceptance of these tools in all scientific environments is good and encouraged. However, of course, as AI can be unpredictable, hallucinate and sometimes only spill common knowledge with no new insights, entirely using it to peer review a manuscript is not something I would agree. Also, there could be privacy reasons, as peer review data are confidential.

In my case, I did use it, but I think it helped a lot. So, I was dealing with a manuscript that dealt with complex concepts that would sometimes go beyond my area of expertise. So I did my normal peer review and then provided ChatGPT with my review, and requested it to read it and tell me if there could be any relevant conceptual mistakes regarding these complex topics (in addition to spelling errors), and he did find some, after which I looked into and promptly corrected and made my review more aligned with what was really going on in the article. So I think it helped a lot and was used in a safe and good way!

2 Likes

the current models make things up at times, and are not the most reliable with literature as Daniel noted. i think some journals are actually trying to implement their own ai tools to help reviewers, but other than that, it’s a soft no from me for the time being. although not ideal, peer review really is the only thing we have to ensure reliable output these days and we still need the actual human experts going through things… by the way, it’s interesting you used “he” for chatgpt @danieltds :laughing:

having a guest editor role really changed my perspective on the review aspect these days and i don’t mind the typos, grammatical errors as much in reviews as long as they get the message across. mean comments are at times not intended to be mean, some people who are not comfortable in english or rushing to write up reviews can write in a way that is way more direct and coming across as mean, so i’ve been noticing a pattern with certain reviewers that i personally know for instance. i know they’re not mean, and they’re not aiming to be mean in their comments, but it’s just how they write things down when trying to be formal. i get the sweetest notes in the comments for the editor part, where they’re more casual in their writing and see they’re really trying to advocate for the paper. i’ve been reviewing 1-2 papers per week now and my comments are definitely written in an insane rush and probably look pretty mean because i don’t have the energy to try to soften the language and don’t realize how things come across in that rush. i’ve also seen the other way around, soft sweet comments to the author and then recommending to “reject the paper” in the comment to the editor because of various reasons.

3 Likes

It depends on the use.

As a non-native English speaker, I use LLMs to fix my texts. I have serious issues with at/on/in/pspspspspsps :black_cat:, and we know that when the writing doesn’t look good, the content may not be taken seriously.

Now, about peer review: I am against using LLMs in peer review. An LLM cannot think, nor can it take responsibility. Peer review is a critical step in the scientific pipeline because it is the moment when the work receives the implicit certification of “OK, this work makes sense and I believe it is true.” You choose reviewers who are well known in the field because the work may be used in decision-making processes, and future projects may be built on the work being evaluated. Adding an LLM (something that is unable to create new concepts or properly understand the proposed work) may reduce the number of papers with innovative approaches, accept bad papers because it sounds the same as previous works, and is also prone to manipulation, as we have seen in recent months.

So, in a few words: if you want to improve the text in your review, go ahead. If you want to use an LLM to decide whether a paper is good or not, maybe you should not be doing the review.


Figure from ChatGPT Can't Count Characters? - ChatGPT - OpenAI Developer Community

3 Likes