A recent analysis by Canadian startup GPTZero identified hundreds of AI-hallucinated citations in over 53 research papers accepted and presented at the NeurIPS (Neural Information Processing Systems) 2025 conference in San Diego.
Findings
GPTZero, founded in January 2023, analyzed more than 4,000 research papers. The identified hallucinations included:
- Completely fabricated citations (nonexistent authors, paper titles, journals, or URLs).
- Blended or paraphrased elements from multiple real papers, creating believable but incorrect citations.
- Subtle alterations to real papers, such as expanding author initials, adding/dropping coauthors, or paraphrasing titles.
- Plainly incorrect authors like "John Smith" or "Jane Doe."
NeurIPS Response
The NeurIPS board issued a statement acknowledging the evolving use of LLMs in papers and their policy of instructing reviewers to flag hallucinations in 2025. The board noted that even if a small percentage of papers contain incorrect references due to LLMs, the content of the papers themselves are not necessarily invalidated. They stated a commitment to evolving review and authorship processes to ensure scientific rigor and explore how LLMs can enhance author and reviewer capabilities.
Prior Detection and Methodology
Edward Tian, CEO of GPTZero, reported that the company previously uncovered 50 hallucinated citations in papers under review for another AI research conference, ICLR. ICLR subsequently hired GPTZero to check future submissions.
GPTZero's hallucination checker tool functions by:
- Ingesting a paper and scanning its citations.
- Searching the open web and academic databases to verify authors, titles, publication venues, and links for each reference.
- Flagging citations that cannot be found, partially match existing papers, or contain inconsistencies (e.g., adding nonexistent authors to a real paper).
The company states the tool is over 99% accurate. For the NeurIPS analysis, a human expert from GPTZero's machine-learning team manually verified every flagged citation.
Implications
NeurIPS 2025 accepted papers at a rate of 24.52%, meaning papers with hallucinations were accepted over thousands of others. In academic norms, fabricated citations are typically grounds for rejection as references are crucial for anchoring research and demonstrating engagement with existing work.
Tian noted the significance of these findings, describing them as the first documented cases of hallucinated citations entering the official record of a top machine learning conference. Approximately half of the papers with hallucinated citations were also identified as likely having high AI generation or usage.
The large volume of submissions (21,575 for NeurIPS 2025) presents a challenge for deep scrutiny by volunteer reviewers. Citations are particularly important in AI research to address reproducibility issues, and hallucinated citations can impede this by directing researchers to nonexistent work.