Change search
ReferencesLink to record
Permanent link

Direct link
Benchmarking the next generation of homology inference tools
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Number of Authors: 3
2016 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 17, 2636-2641 p.Article in journal (Refereed) Published
Abstract [en]

Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the 'next generation' of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM+Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity.

Place, publisher, year, edition, pages
2016. Vol. 32, no 17, 2636-2641 p.
National Category
Biological Sciences Environmental Biotechnology Computer and Information Science Mathematics
Identifiers
URN: urn:nbn:se:su:diva-135027DOI: 10.1093/bioinformatics/btw305ISI: 000384666800059PubMedID: 27256311OAI: oai:DiVA.org:su-135027DiVA: diva2:1045616
Conference
15th European Conference on Computational Biology (ECCB), The Hague, Netherlands, September 3-7, 2016
Available from: 2016-11-10 Created: 2016-10-31 Last updated: 2016-11-10Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Sonnhammer, Erik L. L.
By organisation
Department of Biochemistry and BiophysicsScience for Life Laboratory (SciLifeLab)
In the same journal
Bioinformatics
Biological SciencesEnvironmental BiotechnologyComputer and Information ScienceMathematics

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 24 hits
ReferencesLink to record
Permanent link

Direct link