Science & Enterprise logo
Science for business people. Enterprise for scientists.

Primate Genome A.I. Model IDs Human Disease Variants

Mandrill, a primate species
Mandrill, a primate species (Mhy, Pixabay. https://pixabay.com/photos/animal-mandrill-primate-species-396289/)

2 June 2023. A genomic systems company created an artificial intelligence algorithm from primate genomic data that identifies possible disease-causing mutations in humans. The data and analysis from the algorithm developed by Illumina Inc. in San Diego and academic researchers are published in today’s special issue of the journal Science (paid subscription required) with additional reports on primate evolution in the companion journal Science Advances.

An international research team organized by Illumina, a maker of high-throughput genomic sequencing systems, seeks to broaden the focus of most large-scale genome analytics that focus on human genetics, which often suffer from a preponderance of data on Europeans. This lack of diversity, say the authors, limits the effectiveness of genomic analysis, noting that only about one-tenth of one percent (0.1%) of the more than 70 million possible human protein-altering genomic variations in clinical databases are annotated, with the rest of unknown or uncertain importance.

Humans are part of the primate order that includes species ranging from small lemurs to gorillas and orangutans. For this study, the researchers took a different approach to analyzing human genomics by broadening the scope to the entire primate order. The authors note that human and non-human primates share a largely common genome with near-perfect amino acid sequences among the species. The team hypothesized that genomic variants common to human and non-human primates are unlikely to cause disease, since they survived natural selection in their evolutionary past stretching back tens of millions of years.

Identify more than 4 million unclassified variants

More than 100 researchers led by Kyle Farh, Illumina’s vice president for A.I., obtained whole-genome data from 809 individual cases representing 233 primate species. From those primate genomes, the team identified 4.3 million common missense variants, mutations where a single base pair of amino acids is different from the usual base pair in that sequence, which can alter the corresponding protein in the species. The researchers note that common missense variants between humans and at least one other primate are annotated in 98.7 percent of entries classified as benign in the ClinVar database of genomic variations linked to disease. And the authors note that as a result of their analysis, ClinVar can now identify more than 4 million previously unclassified missense variants as benign, a 50-fold increase.

Yet the vast majority of missense variants in the human genome remain unclassified. To help clarify their status, Farh and colleagues built an algorithm called Primate AI-3D, based on a convolutional neural network that combines image analysis with machine learning. The researchers trained the algorithm on 3-D protein structure images from AlphaFold, an A.I. system that builds protein structures from amino acid sequences. The team also designed the algorithm to separate variants common to all primates from matched controls, and predict human disease-causing probabilities.

The researchers tested and validated Primate AI-3D against 15 other machine-learning algorithms on four disease databases, to distinguish between benign and disease-causing genomic variations, in six different clinical benchmarks, with authors noting that Primate AI-3D outperformed them all. In the UK Biobank, a database with correlated genomic and health data on a half-million U.K. residents, the algorithm found many people considered healthy, particularly those of non-European origin, are at increased risk of common diseases.

“What we find,” says Fahr in an Illumina news feature, “is that 97 percent of otherwise healthy people in the general population carried highly actionable variants for clinically relevant conditions.” Fahr adds,”Up to now we’ve learned that you need genome sequencing if you have a rare disease or cancer. But actually it looks like every healthy person in the population has highly impactful variants in our genomes that are clinically relevant and are important to be informed about.”

More from Science & Enterprise:

We designed Science & Enterprise for busy readers including investors, researchers, entrepreneurs, and students. Except for a narrow cookies and privacy strip for first-time visitors, we have no pop-ups blocking the entire page, nor distracting animated GIF graphics. If you want to subscribe for daily email alerts, you can do that here, or find the link in the upper left-hand corner of the desktop page. The site is free, with no paywall. But, of course, donations are gratefully accepted.

[wpedon id=”42724″ align=”center”]

*     *     *