Donate to Science & Enterprise

S&E on Mastodon

S&E on LinkedIn

S&E on Flipboard

Please share Science & Enterprise

AI, Wearables Posing Health Privacy Risks

Operating a smart watch


4 Jan. 2019. An analysis shows artificial intelligence algorithms can connect Americans’ activity tracking data to to their individual private medical information. A team of engineers and statisticians at University of California in Berkeley and other institutions describe their techniques and findings in the 21 December 2018 issue of the journal JAMA Network Open.

Researchers led by UC-Berkeley engineering and operations research professor Anil Aswani assessed risks to privacy posed by the rapidly growing number of devices, such as activity trackers and smartwatches, worn by people to monitor their physical activity and other health-related measures. Many individuals wearing these devices share the data voluntarily to compare their activity with others, or in public health studies. People sharing their data are promised anonymity, often by removing personal identifiers such as name or other unique indicators from the shared data.

At the same time these activity tracking devices are growing in popularity, artificial intelligence, or A.I. technologies are becoming more accessible for common applications, such as mining of large data sets to find underlying behavior patterns. Aswani and colleagues raised concerns about using these data mining techniques to re-identify individuals who share their activity tracking data on the promise of anonymity, and designed their study to find out if it could be done.

The researchers tapped into public health databases to test these concerns, specifically from the National Health and Nutrition Examination Surveys , or NHANES, collected by Centers for Disease Control and Prevention among 5,000 U.S. residents each year. NHANES participants are promised anonymity, with no researchers allowed to access their names, and geographical data available only with special permission.

NHANES surveys ask about a wide range of health indicators, including physical exams and lab tests, but they also collect data from accelerometers worn by a sub-sample of participants to measure their physical activity for 7 days. The team randomly selected some 15,000 records, both adults and children with these activity data, divided about evenly between the 2003-2004 and 2005-2006 data sets.

The researchers used two types of machine learning algorithms to train and mine the NHANES data. One method known as support vector machine algorithms analyze a collection of data points and compute a distinct classification or definition based on characteristics of that collection. The second technique called random forest algorithms build data into decision-trees, then merge the data from the trees together, thus the “forest” reference, for accurate and stable decisions.

The results show the algorithms can connect large majorities of individual participants’ data to their unique identification numbers in the surveys. The random forest algorithm successfully connected demographic and physical activity data in participants’ records to 94 to 95 percent of the adults’ NHANES identification numbers, and 86 to 87 percent of the children. Likewise, the support vector machine algorithm connected 85 to 86 percent of the adults’ demographic and physical activity data to unique identifier numbers, and 67 to 70 percent of the children’s identifiers.

The authors conclude that current methods may not be adequate for ensuring the privacy of individuals sharing their physical activity data. “The results point out a major problem,” says Aswani in a university statement. “If you strip all the identifying information, it doesn’t protect you as much as you’d think. Someone else can come back and put it all back together if they have the right kind of information.”

Aswanti also points out how a commercial social media platform could do it, by “gathering step data from the app on your smartphone, then buying health care data from another company and matching the two. Now they would have health care data that’s matched to names, and they could either start selling advertising based on that or they could sell the data to others.”

More from Science & Enterprise:

*     *     *

3 comments to AI, Wearables Posing Health Privacy Risks