Microsoft has revealed details of an artificial intelligence system that performs better than human doctors at complex health diagnoses, creating a “path to medical superintelligence”.
The company’s AI unit, which is led by the British tech pioneer Mustafa Suleyman, has developed a system that imitates a panel of expert physicians tackling “diagnostically complex and intellectually demanding” cases.
Microsoft said that when paired with OpenAI’s advanced o3 AI model, its approach “solved” more than eight of 10 case studies specially chosen for the diagnostic challenge. When those case studies were tried on practising physicians – who had no access to colleagues, textbooks or chatbots – the accuracy rate was two out of 10.
Microsoft said it was also a cheaper option than using human doctors because it was more efficient at ordering tests.
Despite highlighting the potential cost savings from its research, Microsoft played down the job implications, saying it believed AI would complement doctors’ roles rather than replace them.
“Their clinical roles are much broader than simply making a diagnosis. They need to navigate ambiguity and build trust with patients and their families in a way that AI isn’t set up to do,” the company wrote in a blogpost announcing the research, which is being submitted for peer review.
However, using the slogan “path to medical superintelligence” raises the prospect of radical change in the healthcare market. While artificial general intelligence (AGI) refers to systems that match human cognitive abilities at any given task, superintelligence is an equally theoretical term referring to a system that exceeds human intellectual performance across the board.
Explaining the rationale behind the research, Microsoft raised doubt over AI’s ability to score exceptionally well in the United States Medical Licensing Examination, a key test for obtaining a medical licence in the US. It said the multiple-choice tests favoured memorising answers over deep understanding of a subject, which could help “overstate” the competence of an AI model.
Microsoft said it was developing a system that, like a real-world clinician, takes step-by-step measures – such as asking specific questions and requesting diagnostic tests – to arrive at a final diagnosis. For instance, a patient with symptoms of a cough and fever may require blood tests and a chest X-ray before the doctor arrives at a diagnosis of pneumonia.
The new Microsoft approach uses complex case studies from the New England Journal of Medicine (NEJM).
Suleyman’s team transformed more than 300 of these studies into “interactive case challenges” that it used to test its approach. Microsoft’s approach used existing AI models, including those produced by ChatGPT’s developer, OpenAI, Mark Zuckerberg’s Meta, Anthropic, Elon Musk’s Grok and Google’s Gemini.
Microsoft then used a bespoke, agent-like AI system called a “diagnostic orchestrator” to work with a given model on what tests to order and what the diagnosis might be. The orchestrator in effect imitates a panel of physicians, which then comes up with the diagnosis.
Microsoft said that when paired with OpenAI’s advanced o3 model, it “solved” more than eight of 10 NEJM case studies – compared with a two out of 10 success rate for human doctors.
Microsoft said its approach was able to wield a “breadth and depth of expertise” that went beyond individual physicians because it could span multiple medical disciplines.
It added: “Scaling this level of reasoning – and beyond – has the potential to reshape healthcare. AI could empower patients to self-manage routine aspects of care and equip clinicians with advanced decision support for complex cases.”
Microsoft acknowledged its work is not ready for clinical use. Further testing is needed on its “orchestrator” to assess its performance on more common symptoms, for instance.