24 May 2017. A new release of software that speeds and streamlines analysis of genomic sequencing variations is being released under an open-source license by the Broad Institute. The software known as Genome Analysis Toolkit version 4, or GATK4, is being developed by the Broad Institute, a medical research center at Harvard University and MIT, and Intel Corporation.
GATK is offered as a set of computational tools to help discover genomic sequencing variations, or changes in genes such as polymorphisms that affect an individual’s characteristics or mutations that may indicate a cause of disease. In addition, the GATK site recommends best practices for sharing, optimizing, and integrating genomic data. Among the changes identified by GATK are single nucleotide polymorphisms, or SNPs, that occur in the basic nucleic acid building blocks. The software also identifies indels, short for insertions and deletions, in genomes, second in number only to SNPs.
The new version 4 of GATK also highlights copy number variations, changes in the number of copies of a gene in a person’s DNA, as well as variations in the DNA structure, indicative of disease. These tools can be applied to germline variations — those transmitted to offspring — and somatic variants that arise spontaneously. The latest version also is optimized, say its developers, for performance-boosting processing engines, such as Apache Spark, available as well in open-source.
Broad Institute began partnering with Intel in November 2016 to develop better methods for integrating and processing the enormous amounts of data produced in genomic analysis, and formed the Intel-Broad Center for Genomics Data Engineering to make it happen. Not only are the amounts of genomics data rapidly expanding, the data sets reside on different types of server systems, with varying degrees of access through the cloud. The collaboration, say the partners, combines Broad’s work in genomic and life sciences research with Intel’s expertise in analytics and artificial intelligence.
GATK is still in “alpha” or early testing stage, but Broad says some 45,000 academic and commercial sites are already using the software regularly. A beta test version is expected to be released in June 2017.
“We wanted to remove traditional barriers of scale while offering the same high level of data quality our users expect,” says Eric Banks, a lead developer of GATK in a Broad Institute statement. “Thanks to the rapid adoption of cloud computing, researchers can finally do away with many of the infrastructure-related complications that have hampered progress, especially at smaller institutions and startups.”
In addition to release of GATK4 in open-source, Intel vice-president Jason Waxman announced the Broad-Intel Genomics Stack, or BIGStack, an overall systems architecture designed for genomics analysis. Waxman says BIGStack provides a five-fold improvement in genomics analytical capability at Broad. Waxman also says Chinese genomics company BGI is adopting GATK’s current analytical tools.
More from Science & Enterprise:
- Regeneron to Sequence 500K UK Biobank Genomes
- Heart Assoc, Amazon Partner on Precision Medicine Data
- Knowledge Bank Tapped for Precision Cancer Treatments
- Illumina Partnering with IBM, Phillips on Genomics Tools
- Genetic Tests, Elec. Health Records Show Hidden Disease
* * *
You must be logged in to post a comment.