Science & Enterprise logo
Science for business people. Enterprise for scientists.

Personal Genetic Information Vulnerabilty Exposed

DNA strand (Genome.gov)
(Genome.gov)

Researchers from the Whitehead Institute for Biomedical Research at MIT in Cambridge, Massachusetts were able to identify some 50 people who submitted samples as part of genetic studies with publicly accessible online resources. The team led by Yaniv Erlich of the Whitehead Institute, with colleagues from MIT, Harvard, Baylor College of Medicine in Houston, International Computer Science Institute in Berkeley, California, and Tel Aviv University in Israel published their findings in this week’s issue of the journal Science (paid subscription required).

Erlich and colleagues had to take various steps to identify these research participants, but the team showed under certain conditions, the names and identities of genomic research subjects can be determined, even when personal genetic information is held in databases with identities removed. This kind of inquiry, called a vulnerability study, is common practice in information security, and the research team shared its findings with genomic research authorities at National Institutes of Health (NIH) in advance of publication.

Erlich and colleagues analyzed unique markers known as short tandem repeats on Y chromosomes of men whose genetic material was collected by the Center for the Study of Human Polymorphisms (CEPH) and whose genomes were sequenced and made publicly available as part of the 1000 Genomes Project. The 1000 Genomes Project aims to find most genetic variants that have frequencies of at least one percent in the populations studied, to provide a comprehensive resource on human genetic variation.

Because the  Y chromosome is transmitted from father to son, as are family names, there is a strong correlation between family names and DNA on the Y chromosome. Because of this correlation, researchers and companies in genetic genealogy established publicly accessible databases that store the unique short tandem repeats on Y chromosomes. By querying Y choromsome data with these databases, Erlich and colleagues were able to uncover the family names of men in the databases.

With the family names discovered, the researchers then checked other public information sources: Internet record search engines, obituaries, genealogical Web sites, and public demographic data from the Human Genetic Cell Repository at New Jersey’s Coriell Institute, a database of the National Institute of General Medical Sciences (NIGMS), part of NIH. With these inquiries, combined with the family names and short tandem repeats on Y chromosomes, Erlich and colleagues were able to identify some 50 men and women who were CEPH participants.

The process exposed a vulnerability in identifying distantly-related individuals through their paternal relationships, simply by one member of an extended family taking part in a genomic database. “We show that if, for example, your Uncle Dave submitted his DNA to a genetic genealogy database, you could be identified,” says Melissa Gymrek, a Whitehead Institute colleague of Erlich’s and first author of the Science paper. “In fact, even your fourth cousin Patrick, whom you’ve never met, could identify you if his DNA is in the database, as long as he is paternally related to you.”

Erlich notes he has no intention of revealing the names of those identified, nor does he wish to see public sharing of genetic information curtailed. “Our aim is to better illuminate the current status of identifiability of genetic data,” says Erlich. “We also hope that this study will eventually result in better security algorithms, better policy guidelines, and better legislation to help mitigate some of the risks described.”

Erlich shared his findings with the National Human Genome Research Institute (NHGRI), also part of NIH, and NIGMS prior to publication. In response, NIGMS and NHGRI moved certain demographic information from the publicly-accessible portion the NIGMS cell repository to help reduce the risk of future breaches. In the same issue of Science, officials with NHGRI and NIGMS discuss the implications of Erlich’s findings, as well as the need to balance scientific openness with privacy.

Read more:

*     *     *