When AI Gets It Wrong: Two Studies Examine Falsehoods in Large Language Models
Two separate studies, published in 2025 and 2026, have investigated the tendency of large language models (LLMs) and other AI systems to produce or accept factually incorrect information. The research explores the conditions under which these errors occur and draws comparisons to human cognitive processes.
Study One: Susceptibility to False Premises
A study accepted for the 2026 Annual Meeting of the Association for Computational Linguistics examined how LLMs respond to false claims presented in conversational settings.
Methodology
The researchers developed a "hallucination audit under nudge trial" method structured in three stages:
- Generating statements about a topic.
- Attempting to verify these statements.
- Introducing a "nudge" to observe whether the model accepts or rejects incorrect claims.
The experiment was conducted on 1,000 popular movies and 1,000 novels. False premises included references to historical figures and elements (e.g., Hitler, dinosaurs, time machines) in works known not to contain them.
Results
The study tested five leading models: Claude, Grok, ChatGPT, Gemini, and DeepSeek. According to the findings:
All models sometimes accepted falsehoods when challenged with their own incorrect claims.
- Claude demonstrated the highest resistance to accepting false statements.
- Gemini and DeepSeek exhibited the least resistance.
The researchers noted that the reasons for varying resistance levels between models remain unclear.
Implications
The authors highlighted potential concerns for AI applications in health, law, and public policy. They recommended evaluating AI systems' ability to maintain accurate information within interactive, conversational contexts, rather than relying solely on static training data.
Future research is planned to extend the auditing method to scientific literature and health-related claims.
Study Two: Comparative Analysis of AI Errors and Human Psychopathology
A perspective article published in NPP–Digital Psychiatry and Neuroscience compared errors produced by AI systems—specifically ChatGPT and the automatic speech recognition tool Whisper—to confabulations and hallucinations in human psychiatry.
Key Findings
The article described two types of AI errors:
- Confabulations: LLMs like ChatGPT generate factually incorrect but plausible text, attributed to filling gaps in missing data or ambiguous prompts.
- Hallucinations: Automatic speech recognition tools like Whisper produce nonsensical transcription errors, particularly with degraded input, resulting in repetitive content.
Similarities and Differences
Similarities: Both AI systems and humans can produce plausible but inaccurate information (confabulations) or error-filled output from degraded input (hallucinations).
Differences: The authors argued that underlying mechanisms differ fundamentally:
LLMs lack conscious self-modeling and episodic memory, unlike humans. ASR errors stem from statistical regularities in data, rather than misattributed inner speech or threat processing seen in human psychiatric conditions.
Mitigation Strategies
The article proposed several strategies for reducing AI errors, including:
- Uncertainty estimation
- Internal consistency checks
- Multi-pass verification
- Retrieval-augmented generation
- Specific prompt design
The authors drew analogies to human error mitigation methods, such as cognitive-behavioral therapy and ensuring sufficient neurobiological resources (e.g., sleep).
Conclusion
The article concluded that comparisons between AI errors and human psychopathological symptoms are provisional and mechanistically limited. Both phenomena are described as features of predictive systems, and the authors suggested that studying them side-by-side may improve understanding of both AI and human cognition.