The popular artificial intelligence (AI) chatbot ChatGPT had a diagnostic error rate of more than 80 percent in a new study looking at the use of artificial intelligence in pediatric case diagnosis.
For the study published in JAMA Pediatrics this week, texts from 100 case challenges found in JAMA and the New England Journal of Medicine were entered into ChatGPT version 3.5. The chatbot was then given the prompt: “List a differential diagnosis and a final diagnosis.”
These pediatric cases were all from the past 10 years.
The accuracy of ChatGPT’s diagnoses was determined by whether they aligned with physicians’ diagnoses. Two physician researchers scored the diagnoses as either correct, incorrect or “did not fully capture diagnosis.”
Overall, 83 percent of the AI-generated diagnoses were found to be in error, with 72 percent being incorrect and 11 percent being “clinically related but too broad to be considered a correct diagnosis.”
Despite the high rate of diagnostic errors detected by the researchers, the study recommended continued inquiry into physicians’ use of large language models, noting it could help as an administrative tool.
“The chatbot evaluated in this study—unlike physicians—was not able to identify some relationships, such as that between autism and vitamin deficiencies. To improve the generative AI chatbot’s diagnostic accuracy, more selective training is likely required,” the study said.
ChatGPT’s available knowledge is not regularly updated, the study also noted, meaning it doesn’t have access to new research, health trends, diagnostic criteria or disease outbreaks.
Physicians and researchers have increasingly looked into ways of incorporating AI and language models into medical work. A study published last year found that GPT-4 from OpenAI was able to provide an accurate diagnosis of patients over the age of 65 better than clinicians. This study, however, only had a sample size of 6 patients.
Researchers in this earlier study noted the chatbot could potentially be used to “increase confidence in diagnosis.”
The use of AI diagnostics is not a novel concept. The Food and Drug Administration has approved hundreds of AI-enabled medical devices, though none that use generative AI or are powered by large language models like ChatGPT have been approved so far.