Donate to Science & Enterprise

S&E on Mastodon

S&E on LinkedIn

S&E on Flipboard

Please share Science & Enterprise

Review – Deep Learning As Good As Live Expert Diagnostics

Inspecting lung X-rays

(National Heart, Lung, and Blood Institute, NIH)

25 Sept. 2019. A review of recent research shows — with caveats — deep learning algorithms appear as effective as live human experts in accurately interpreting medical images. The review appearing in today’s issue of the journal The Lancet Digital Health cautions that the analysis is based on a small number of cases, and better reporting methods with higher standards are needed for studies of artificial intelligence in medicine.

A team from University of Birmingham in the U.K. sought to determine the effectiveness of deep learning, a type of artificial intelligence, to interpret medical images, a valuable resource for diagnosing a patient’s condition. But medical images require interpretation by humans, with all of their risks of error and inconsistency. Yet, as the authors note, the excitement for automated interpretation of medical images may also bias the design of studies evaluating the technology or limit the ability to generalize the findings to real-world settings.

Deep learning is a form of machine learning and artificial intelligence that makes it possible for systems to discern underlying patterns in relationships, and build those relationships into knowledge bases applied to a number of disciplines. The technique uses machine learning to form layers of neural networks, with each layer adding to the knowledge derived from previous layers.

An emerging type of deep learning applied to images is called a convolutional neural network.  In this scheme, an algorithm dissects an image by layers to understand features in the image. Different aspects of each layer discovered and analyzed by the algorithm are translated into data that the algorithm then uses to train its understanding of the problem being solved, with that understanding enhanced and refined as more images and data are encountered.

Researchers led Alistair Denniston, an ophthalmologist who studies artificial intelligence and health data at Birmingham, reviewed published studies assessing deep learning algorithms for diagnosing human disorders with medical images against live experts. The team screened nearly 31,600 published studies between January 2012 and June 2019 in scientific and medical research databases, of which 82 met the initial criteria. From that collection, only 25 papers validated their algorithms with an independent subset of images, and provided enough detailed results to extract the data into tables for statistical analysis.

From these 25 papers, notes Denniston in a university statement, “just 14 studies actually compared the performance of AI and health professionals using the same test sample.” These papers covered conditions in ophthalmology, solid tumor cancers, trauma and orthopedics, respiratory disease, heart disease, gastroenterology, and face reconstruction surgery.

The researchers then evaluated the ability of deep learning algorithms in the 14 studies to detect disease from patients’ medical images, and found the algorithms accurately diagnosed these conditions in 87 percent of cases, while health care experts returned an accurate detection in 86 percent of cases. Comparing the ability to identify healthy from diseased conditions, deep learning algorithms were accurate in 93 percent of cases, compared to 91 percent for live experts.

“Within those handful of high-quality studies,” says Denniston, “we found that deep learning could indeed detect diseases ranging from cancers to eye diseases as accurately as health professionals. But it’s important to note that AI did not substantially out-perform human diagnosis.”

The authors point out that the small number of studies with data sufficiently robust and detailed for this type of analysis highlights an underlying problem in evaluating deep learning in medicine. Few of the papers validated their models with data independent of the knowledge bases used to train their algorithms, or tested the models with real-world clinical data. Moreover, reporting on these assessments use inconsistent language and statistics, making it difficult to judge the quality of the results. The authors call for international standards to govern research protocol design and reporting to address these issues.

More from Science & Enterprise:

*     *     *

Comments are closed.