Donate to Science & Enterprise

S&E on Mastodon

S&E on LinkedIn

S&E on Flipboard

Please share Science & Enterprise

Big Data Quickly Identify Foodborne Illness Sources

Pins on a map (CJ Sorg/Flickr)

(CJ Sorg/Flickr)

3 July 2014. Data analysts and public health experts at the IBM research center in San Jose, California developed techniques for faster identification of sources of foodborne diseases from available public health and retail sales data. The team led by IBM’s James Kaufman, the company’s public health research manager, published its findings today online in the journal PLOS Computational Biology.

Kaufman, with colleagues from IBM, Johns Hopkins University in Baltimore, and the Federal Institute for Risk Assessment in Berlin, Germany addressed a growing problem of finding the sources of contaminated food faster to prevent outbreaks from spreading. In the U.S. alone, Centers for Disease Control and Prevention estimates foodborne illnesses strike 1 in 6 Americans each year or 48 million people, causing 128,000 hospitalizations, and 3,000 deaths. The authors cite sources calculating medical costs worldwide for foodborne illnesses at $9 billion a year and indirect economic costs (e.g., lost work time) at $75 billion.

The techniques designed by the researchers start with reports of contaminated food illnesses and corresponding lab analyses issued by public health authorities, which when used alone can take days or weeks to identify sources of the outbreaks. The researchers added in sales data from food retailers, captured in standard product and packaging codes. When arrayed against geo-coded food-related illness reports, the combination of outbreak and sales data begins to form a clearer picture of potential sources. Up to now, food sales data, collected routinely by retailers, were not used in this way.

The team adapted a statistical technique called receiver-operating characteristic analysis first used in identifying radar blips in World War II, and since expanded into many fields including epidemiology and bioinformatics. The technique classifies subjects into one of two conditions — e.g., contaminated or not contaminated — which enables construction of statistical tests and graphical models.

Applying this technique enabled the team to estimate the likelihood of individual food items causing the outbreaks, in effect, learning from the data and recalculating the probability of each food item causing the outbreak, as new data were introduced. The results could then be displayed geographically.

The researchers tested the technique in a simulation with real-world data from a German retailer, providing weekly sales of 580 food items over a three-year period sold throughout Germany, matched against 60,000 food illness outbreaks. The results show it is possible with this technique to identify likely culprits in the sales data causing the outbreaks from as few as 10 outbreak reports, although the probabilities rise when the numbers increase to 20 or 50 reports.

“Predictive analytics based on location, content, and context are driving our ability to quickly discover hidden patterns and relationships from diverse public health and retail data,” says Kaufman in an IBM statement. “We are working with our public health clients and with retailers in the U.S. to scale this research prototype and begin focusing on the 1.7 billion supermarket items sold each week in the United States.”

In the following video, Kaufman tells more about the study.

Read more:

*     *     *

Comments are closed.