AI Health Bots Under Scrutiny for Inaccurate Assessments
ChatGPT Health and Anthropic's Claude for Healthcare, both designed to analyze personal fitness tracker and medical record data, have faced criticism for providing inaccurate health assessments. A Washington Post reporter tested both services, giving them access to their Apple Health data and medical records. ChatGPT Health initially graded the reporter's cardiac health an 'F', which later shifted to a 'D' after incorporating medical records. Claude for Healthcare assigned a 'C'.
Expert Evaluations
Upon reviewing the AI assessments, the reporter's actual doctor stated the 'F' grade was inaccurate, noting a low risk for heart attack.
Cardiologist Eric Topol of the Scripps Research Institute described ChatGPT's assessment as "baseless" and emphasized that these AI tools are "not ready for any medical advice."
Identified Issues with AI Analysis
- Questionable Metrics: ChatGPT's analysis heavily relied on Apple Watch's estimated VO2 max and heart-rate variability, which experts deem imprecise or subject to fuzziness.
- Data Inconsistencies: The reporter observed significant swings in resting heart rate data when new Apple Watches were introduced, indicating potential device-specific tracking variations that the AI treated as clear health signals.
- Erratic Responses: Repeated queries to ChatGPT for a heart health score yielded wildly varying grades, fluctuating between 'F' and 'B'. OpenAI acknowledged that its model might weigh data sources differently across conversations.
- Privacy Concerns: Despite OpenAI's claims of extra privacy steps, ChatGPT is not covered by HIPAA, raising questions about handling sensitive health data.
Company Responses and Industry Context
OpenAI and Anthropic state their bots are in early testing phases and are not intended to replace doctors or provide diagnoses, yet they offered detailed personal health analysis. They did not specify how they plan to improve the accuracy of personal analysis.
Apple clarified it did not directly collaborate with either AI company on these products. Experts suggest that accurately analyzing long-term body data for disease prediction requires dedicated, sophisticated AI models trained to connect various data layers reliably, which current offerings appear to lack.