Doctors had the correct diagnosis in the top five in 87 percent of cases, compared to 97 percent for ChatGPT version 3.5 and 87 percent for version 4.0.
Co-author Steef Kurstjens told AFP the survey did not indicate that computers could one day be running the ER, but that AI can play a vital role in assisting under-pressure medics. (Reuters)
Artificial intelligence chatbot ChatGPT diagnosed patients rushed to emergency at least as well as doctors and in some cases outperformed them, Dutch researchers have found, saying AI could “revolutionise the medical field”.
But the report published Wednesday also stressed ER doctors needn’t hang up their scrubs just yet, with the chatbot potentially able to speed up diagnosis but not replace human medical judgement and experience.
Scientists examined 30 cases treated in an emergency service in the Netherlands in 2022, feeding in anonymised patient history, lab tests and the doctors’ own observations to ChatGPT, asking it to provide five possible diagnoses.
They then compared the chatbot’s shortlist to the same five diagnoses suggested by ER doctors with access to the same information, then cross-checked with the correct diagnosis in each case.
Doctors had the correct diagnosis in the top five in 87 percent of cases, compared to 97 percent for ChatGPT version 3.5 and 87 percent for version 4.0.
“Simply put, this indicates that ChatGPT was able to suggest medical diagnoses much like a human doctor would,” said Hidde ten Berg, from the emergency medicine department at the Netherlands’ Jeroen Bosch Hospital.
Co-author Steef Kurstjens told AFP the survey did not indicate that computers could one day be running the ER, but that AI can play a vital role in assisting under-pressure medics.
“The key point is that the chatbot doesn’t replace the physician but it can help in providing a diagnosis and it can maybe come up with ideas the doctor hasn’t thought of,” Kurstjens told AFP.
Large language models such as ChatGPT are not designed as medical devices, he stressed, and there would also be privacy concerns about feeding confidential and sensitive medical data into a chatbot.
The chatbot’s reasoning was “at times medically implausible or inconsistent, which can lead to misinformation or incorrect diagnosis, with significant implications,” the report noted.
The scientists also admitted some shortcomings with the research. The sample size was small, with 30 cases examined. In addition, only relatively simple cases were looked at, with patients presenting a single primary complaint.
It was not clear how well the chatbot would fare with more complex cases. “The efficacy of ChatGPT in providing multiple distinct diagnoses for patients with complex or rare diseases remains unverified.”
Sometimes the chatbot did not provide the correct diagnosis in its top five possibilities, Kurstjens explained, notably in the case of an abdominal aneurysm, a potentially life-threatening complication where the aorta artery swells up.
The only consolation for ChatGPT: in that case the doctor got it wrong too.
The report sets out what it calls the medical “bloopers” the chatbot made, for example diagnosing anaemia (low haemoglobin levels in the blood) in a patient with a normal haemoglobin count.
“It’s vital to remember that ChatGPT is not a medical device and there are concerns over privacy when using ChatGPT with medical data,” concluded ten Berg.
“However, there is potential here for saving time and reducing waiting times in the emergency department. The benefit of using artificial intelligence could be in supporting doctors with less experience, or it could help in spotting rare diseases,” he added.