Donate to Science & Enterprise

S&E on Mastodon

S&E on LinkedIn

S&E on Flipboard

Please share Science & Enterprise

Statistical Database Analysis Links Genes, High Cholesterol

Andrea Foulkes (University of Massachusetts - Amherst)

Andrea Foulkes (University of Massachusetts – Amherst)

Researchers at University of Massachusetts in Amherst and University of Pennsylvania developed a technique for analyzing public databases with open-source software to discover populations at genetic risk for disease at lower cost. The team led by UMass biostatistician Andrea Foulkes (pictured right) reported its findings yesterday in the online journal PLoS One.

The technique called Mixed modeling of Meta-Analysis P-values, or MixMAP, draws on summary data now available from genetic association studies in public databases. The published study focuses on genetic links for low density lipoprotein (LDL or “bad”) cholesterol that deposits on artery walls, creating plaque that can restrict flow through the arteries and lead to blood clots and heart attacks. The Centers for Disease Control and Prevention estimates one in three adults in the U.S. has high levels of LDL cholesterol.

Foulkes and senior author Muredach Reilly, a cardiologist at Penn, devised this method as a lower-cost complement to conventional genome-wide association studies that compare common genetic variations in a group of people with a disease to a similar group of healthy individuals. This approach looks for single-nucleotide polymorphisms, or SNPs, to see if one such variation is associated with that disease, and if found, the SNP is considered associated with that disease.

MixMAP, on the other hand, draws on knowledge of DNA regions in the genome that are likely to contain several genetic signals for disease variation grouped together. The technique then aims to detect groups of unusual variations rather than just single SNPs, to highlight regions that consistently give genetic signals greater than normal variations. While the published study applies the technique to LDL cholesterol, the authors say it can be applied to a wider array of disorders.

The researchers tested the method on summary data in two public databases, the Global Lipids Gene Consortium with data on LDL cholesterol levels and SNP probabilities on some 100,000 individuals, and the Penn Coronary Artery Calcification study with genetic and medical history data on nearly 2,100 individuals of European ancestry. These sources yielded a total of 31,827 SNPs in 2,960 genes common to both databases.

MixMAP focuses on a defined set of SNPs in regions with a greater likelihood of high LDL cholesterol from the Global Lipids Gene Consortium database and compares those data to the smaller Penn Coronary Artery Calcification study. In addition to tapping publicly available databases, the technique uses software based on the R Project, an open-source statistical package.

Using MixMAP, Foulkes and colleagues were able to confirm associations with LDL cholesterol in 80 percent of the genetic locations identified through single SNP testing in the Global Lipids Gene Consortium data. The team also discovered 12 new genetic locations for LDL cholesterol that did not reach genome wide significance in single SNP testing.

The authors says the technique still needs validation, which should be feasible in the near future, thanks to larger datasets in development from projects such as the Geisinger eMERGE Genome Wide Association Studies of Obesity (Metabochip) Project. But even at this stage, says Foulkes, “We’ve done better than simply identify the strongest signals, we’ve quantified measures of association to show they are statistically meaningful.”

Read more:

*     *     *

2 comments to Statistical Database Analysis Links Genes, High Cholesterol