Change search
ReferencesLink to record
Permanent link

Direct link
GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm
Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
Number of Authors: 3
2016 (English)In: BMC Evolutionary Biology, ISSN 1471-2148, E-ISSN 1471-2148, Vol. 16, 120Article in journal (Refereed) Published
Abstract [en]

Background: Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. Results: In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. Conclusions: The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.

Place, publisher, year, edition, pages
2016. Vol. 16, 120
Keyword [en]
Homology inference, Gene synteny, Gene similarity, Gene family, Clustering, Gene order conservation
National Category
Biological Sciences
Identifiers
URN: urn:nbn:se:su:diva-131920DOI: 10.1186/s12862-016-0684-2ISI: 000377161400002PubMedID: 27260514OAI: oai:DiVA.org:su-131920DiVA: diva2:946946
Available from: 2016-07-06 Created: 2016-07-04 Last updated: 2016-07-06Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Arvestad, Lars
By organisation
Numerical Analysis and Computer Science (NADA)Science for Life Laboratory (SciLifeLab)
In the same journal
BMC Evolutionary Biology
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 12 hits
ReferencesLink to record
Permanent link

Direct link