Science & Enterprise subscription

Follow us on Twitter

  • A lab studying DNA from past civilizations and the genomics technology company Illumina are investigating remains o… https://t.co/9QAzyMnN3T
    about 1 hour ago
  • New post on Science and Enterprise: Ancient DNA Studied for Mental Health Clues https://t.co/dgAigusqOt #Science #Business
    about 2 hours ago
  • A smartphone app is being developed to help pregnant women and new mothers with an opioid addiction, by a medical p… https://t.co/6YA7lUJnMu
    about 5 hours ago
  • New post on Science and Enterprise: App Created for New, Pregnant Moms in Opioid Recovery https://t.co/Zu5igRQQwk #Science #Business
    about 5 hours ago
  • A start-up company spun-off from Cornell University is receiving a small business grant to develop in crop plants a… https://t.co/oU7iBAy0Uo
    about 1 day ago

Please share Science & Enterprise

RSS
Follow by Email
Facebook
Facebook
Google+
Twitter
Visit Us
LinkedIn
INSTAGRAM

Software Speeds Database Sequence Searches

DNA fragment (Wikimedia Commons)

(Wikimedia Commons)

Computational biologists at Ludwig-Maximilians Universität (LMU) in Munich, Germany have developed software that makes possible a new search method to identify proteins in databases with similar genomic sequences. The software that the developers say is faster and can discover twice as many evolutionarily related proteins as previous methods, is described online in the journal Nature Methods (paid subscription required).

A basic process in genomic research is sequence searches, in which a protein’s sequence is compared with millions of sequences with annotated structures and functions in public databases, many of which are accessible to scientists. The relationship between a protein’s sequence and function makes it possible to predict the structure and function of a given protein by comparing its sequence with those of other proteins with known structures and functions.

The team led by Johannes Söding of LMU’s Genzentrum (Gene Center) developed the software, called HHblits — short for HMM-HMM–based lightning-fast iterative sequence search — that uses different statistical models than current bio-statistics search mechanisms. The models used in HHblits, called Hidden Markov Models (HMMs) include the probabilities of mutations from sequence alignments, which the developers say increases the sensitivity and precision of the search for sequence similarities.

The software also has a filtering process that identifies similar amino acid compositions of proteins, which reduces the amount of data to be searched, a reduction in processing according to Söding as much as 2,500 fold. Current search algorithms rely on pairwise comparisons of protein sequences. The paired comparisons give results showing the mostly identical or similar amino acids paired up in the same columns.

HHblits assembles similar sequences from the database into multiple sequence alignments, and assigns one of 219 identifiers to each alignment column, so that columns with similar amino acid compositions have the same identifier. “By translating the multiple sequence alignments into sequences composed of these 219 letters,” says Söding, “we can replace the time-consuming pairwise comparison of HMMs by the comparison of simple sequences.”

A Web-enabled and open-source version of HHblits is available on the Gene Center Web site.

Read More: Open-Source Genome Analysis Software Developed

*     *     *

Please share Science & Enterprise ...

2 comments to Software Speeds Database Sequence Searches