Study: AI More Effective Than Doctors

A groundbreaking study from Harvard shows that AI systems outperform human doctors in emergency medical triage under high pressure, and diagnose more accurately in the potentially life-threatening moments when people are brought to the hospital.

The results have been described by independent experts as “a real step forward” in the clinical reasoning behind AI, and came as part of studies that tested hundreds of doctors’ answers against AI. According to the authors, the results, published in the journal Science, show that large language models (LLMs) “have surpassed most benchmarks for clinical reasoning.”

In one experiment, focus was placed on 76 patients who arrived at the emergency department of a hospital in Boston. An AI and a pair of human doctors were each given the same electronic patient record to review. These typically contain data on vital signs, demographic information, and a few sentences from a nurse about why the patient is there.

The AI identified the exact or very close diagnosis in 67 percent of cases, which was significantly better than the human doctors, who were only correct in 50–55 percent of cases.

The test also showed that AI’s advantage was particularly pronounced in triage situations that required quick decisions with minimal information. The AI’s diagnostic accuracy increased to 82 percent when more details were available, compared to the 70–79 percent accuracy achieved by the experts.

AI also performed better than a larger cohort of human doctors when asked to provide more long-term treatment plans, such as prescribing antibiotics or planning end-of-life care processes. The AI and 46 doctors were asked to review five clinical case studies, and the computer produced significantly better plans, with 89 percent compared to 34% for people using conventional resources.

Not a replacement – a complement

The researchers stress that this does not spell the end for emergency doctors. The study only tested people against AI looking at patient data that can be communicated via text. AI’s reading of signals, such as the patient’s distress level and their visual appearance, was not tested. This means the AI functioned more like a clinician providing a second opinion based on paperwork.

“I don’t think our results mean that AI is replacing doctors,” says Arjun Manrai, one of the study’s lead authors who heads an AI lab at Harvard Medical School.

“I think it means we are witnessing a truly profound change in technology that will reshape medicine.”

Widespread use

Dr Adam Rodman, another lead author and physician at Beth Israel Deaconess Medical Center in Boston where the study was conducted, believes that AI is among “the most impactful technologies in decades.”

Nearly one in five American doctors already use AI to assist with diagnosis. In the UK, the corresponding proportion is 16 percent daily and another 15 percent weekly, with “clinical decision-making” being one of the most common use cases.