Generative AI's Historical Accuracy Under Scrutiny in New Study
Technological advancements have deeply integrated mobile devices and computers into daily life, providing unprecedented access to information. While ongoing developments in generative artificial intelligence (AI) enhance this access, the accuracy of the information provided remains a significant concern.
Generative AI profoundly influences how historical events and figures are represented. Researchers Matthew Magnani from the University of Maine and Jon Clindaniel from the University of Chicago have delved into this phenomenon. They developed a model, grounded in scientific theory and scholarly research, to systematically evaluate the historical accuracy of AI outputs.
The Study: Depicting Neanderthal Life
For their study, Magnani and Clindaniel tasked two prominent chatbots with creating depictions of Neanderthal daily life:
- DALL-E 3 for image generation.
- ChatGPT API (GPT-3.5) for narrative generation.
They utilized four distinct prompts, each tested 100 times, with variations in requests for scientific accuracy and contextual detail. Neanderthals were specifically chosen due to the well-documented historical shifts and scientific debates surrounding their characteristics and capabilities.
Key Findings: Outdated Information Persists
The research, published in Advances in Archaeological Practice, identified a critical factor influencing AI accuracy: its ability to access current source information. Both the images and narratives generated by the AI models predominantly referenced outdated research, rather than reflecting contemporary scientific understanding.
"AI accuracy depends on its ability to access current source information."
Why This Research Matters
The primary aim of this study was to understand the extent to which biases and misinformation regarding the past are embedded in routine AI use. It underscored the vital importance of examining inherent biases within these technologies and questioning whether users are receiving dated information from chatbots, particularly in specialized fields like archaeology and anthropology.
Specific Inaccuracies in Neanderthal Depictions
The study revealed several striking inaccuracies in the AI-generated content:
- Images: Depicted Neanderthals in a manner consistent with beliefs from over 100 years ago. These images often showcased primitive features, extensive body hair, stooped postures, and frequently lacked the presence of women and children.
- Narratives: Understated the variability and sophistication of Neanderthal culture. Approximately half of the content generated by ChatGPT did not align with current scholarly knowledge, a figure that climbed to over 80% for one particular prompt.
- Technology: Both image and narrative outputs included technology (e.g., basketry, thatched roofs, ladders, glass, metal) that was far too advanced for the Neanderthal period.
Identifying the Source of Outdated Information
Researchers determined that the content produced by ChatGPT was most consistent with scientific literature from the 1960s. In contrast, DALL-E 3's output aligned more closely with research from the late 1980s and early 1990s.
Improving the accuracy of AI output will necessitate ensuring that anthropological datasets and current scholarly articles are readily accessible to AI models. Historically, copyright laws significantly limited access to scholarly research until the advent of open access initiatives in the early 2000s.
Fostering Critical AI Literacy
Magnani emphasized the educational imperative resulting from these findings.
"Teaching students to approach generative AI with caution can foster a more technically literate and critical society."
This study marks a significant step in Magnani and Clindaniel's ongoing research into the application and implications of AI in archaeological topics.