Change search
ReferencesLink to record
Permanent link

Direct link
A Bayesian Method for Analyzing Lateral Gene Transfer
Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). KTH Royal Institute of Technology, Sweden.
Show others and affiliations
2014 (English)In: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 63, no 3, 409-420 p.Article in journal (Refereed) Published
Abstract [en]

Lateral gene transfer (LGT)uwhich transfers DNA between two non-vertically related individuals belonging to the same or different speciesuis recognized as a major force in prokaryotic evolution, and evidence of its impact on eukaryotic evolution is ever increasing. LGT has attracted much public attention for its potential to transfer pathogenic elements and antibiotic resistance in bacteria, and to transfer pesticide resistance from genetically modified crops to other plants. In a wider perspective, there is a growing body of studies highlighting the role of LGT in enabling organisms to occupy new niches or adapt to environmental changes. The challenge LGT poses to the standard tree-based conception of evolution is also being debated. Studies of LGT have, however, been severely limited by a lack of computational tools. The best currently available LGT algorithms are parsimony-based phylogenetic methods, which require a pre-computed gene tree and cannot choose between sometimes wildly differing most parsimonious solutions. Moreover, in many studies, simple heuristics are applied that can only handle putative orthologs and completely disregard gene duplications (GDs). Consequently, proposed LGT among specific gene families, and the rate of LGT in general, remain debated. We present a Bayesian Markov-chain Monte Carlo-based method that integrates GD, gene loss, LGT, and sequence evolution, and apply the method in a genome-wide analysis of two groups of bacteria: Mollicutes and Cyanobacteria. Our analyses show that although the LGT rate between distant species is high, the net combined rate of duplication and close-species LGT is on average higher. We also show that the common practice of disregarding reconcilability in gene tree inference overestimates the number of LGT and duplication events. [Bayesian; gene duplication; gene loss; horizontal gene transfer; lateral gene transfer; MCMC; phylogenetics.].

Place, publisher, year, edition, pages
2014. Vol. 63, no 3, 409-420 p.
Keyword [en]
Bayesian, gene duplication, gene loss, horizontal gene transfer, lateral gene transfer, MCMC, phylogenetics
National Category
Developmental Biology Computer Science
Research subject
Computer Science
URN: urn:nbn:se:su:diva-104137DOI: 10.1093/sysbio/syu007ISI: 000334752600010OAI: diva2:721643


Available from: 2014-06-04 Created: 2014-06-03 Last updated: 2015-11-30Bibliographically approved
In thesis
1. Reconciling gene family evolution and species evolution
Open this publication in new window or tab >>Reconciling gene family evolution and species evolution
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Species evolution can often be adequately described with a phylogenetic tree. Interestingly, this is the case also for the evolution of homologous genes; a gene in an ancestral species may – through gene duplication, gene loss, lateral gene transfer (LGT), and speciation events – give rise to a gene family distributed across contemporaneous species. However, molecular sequence evolution and genetic recombination make the history – the gene tree – non-trivial to reconstruct from present-day sequences. This history is of biological interest, e.g., for inferring potential functional equivalences of extant gene pairs.

In this thesis, we present biologically sound probabilistic models for gene family evolution guided by species evolution – effectively yielding a gene-species tree reconciliation. Using Bayesian Markov-chain Monte Carlo (MCMC) inference techniques, we show that by taking advantage of the information provided by the species tree, our methods achieve more reliable gene tree estimates than traditional species tree-uninformed approaches.

Specifically, we describe a comprehensive model that accounts for gene duplication, gene loss, a relaxed molecular clock, and sequence evolution, and we show that the method performs admirably on synthetic and biological data. Further-more, we present two expansions of the inference procedure, enabling it to pro-vide (i) refined gene tree estimates with timed duplications, and (ii) probabilistic orthology estimates – i.e., that the origin of a pair of extant genes is a speciation.

Finally, we present a substantial development of the model to account also for LGT. A sophisticated algorithmic framework of dynamic programming and numerical methods for differential equations is used to resolve the computational hurdles that LGT brings about. We apply the method on two bacterial datasets where LGT is believed to be prominent, in order to estimate genome-wide LGT and duplication rates. We further show that traditional methods – in which gene trees are reconstructed and reconciled with the species tree in separate stages – are prone to yield inferior gene tree estimates that will overestimate the number of LGT events.

Abstract [sv]

Arters evolution kan i många fall beskrivas med ett träd, vilket redan Darwins anteckningsböcker från HMS Beagle vittnar om. Detta gäller också homologa gener; en gen i en ancestral art kan – genom genduplikationer, genförluster, lateral gentransfer (LGT) och artbildningar – ge upphov till en genfamilj spridd över samtida arter. Att från sekvenser från nu levande arter rekonstruera genfamiljens framväxt – genträdet – är icke-trivialt på grund av genetisk rekombination och sekvensevolution. Genträdet är emellertid av biologiskt intresse, i synnerhet för att det möjliggör antaganden om funktionellt släktskap mellan nutida genpar.

Denna avhandling behandlar biologiskt välgrundade sannolikhetsmodeller för genfamiljsevolution. Dessa modeller tar hjälp av artevolutionens starka inverkan på genfamiljens historia, och ger väsentligen upphov till en förlikning av genträd och artträd. Genom Bayesiansk inferens baserad på Markov-chain Monte Carlo (MCMC) visar vi att våra metoder presterar bättre genträdsskattningar än traditionella ansatser som inte tar artträdet i beaktning.

Mer specifikt beskriver vi en modell som omfattar genduplikationer, genförluster, en relaxerad molekylär klocka, samt sekvensevolution, och visar att metoden ger högkvalitativa skattningar på både syntetiska och biologiska data. Vidare presenterar vi två utvidgningar av detta ramverk som möjliggör (i) genträdsskattningar med tidpunkter för duplikationer, samt (ii) probabilistiska ortologiskattningar – d.v.s. att två nutida gener härstammar från en artbildning.

Slutligen presenterar vi en modell som inkluderar LGT utöver ovan nämnda mekanismer. De beräkningsmässiga svårigheter som LGT ger upphov till löses med ett intrikat ramverk av dynamisk programmering och numeriska metoder för differentialekvationer. Vi tillämpar metoden för att skatta LGT- och duplikationsraten hos två bakteriella dataset där LGT förmodas ha spelat en central roll. Vi visar också att traditionella metoder – där genträd skattas och förlikas med artträdet i separata steg – tenderar att ge sämre genträdsskattningar, och därmed överskatta antalet LGT-händelser.

Place, publisher, year, edition, pages
Stockholm: Numerical Analysis and Computer Science (NADA), Stockholm University, 2013. 59 p.
Computational biology, Bioinformatics, Phylogenetics, Phylogenomics, Comparative genomics, Evolutionary biology
National Category
Computer Science
Research subject
Computer Science
urn:nbn:se:su:diva-93346 (URN)978-91-7447-760-3 (ISBN)
Public defence
2013-11-04, Inghesalen, Widerströmska huset, Karolinska Institutet, Tomtebodavägen 18, Solna, 13:30 (English)

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 5: Manuscript.

Available from: 2013-10-13 Created: 2013-09-09 Last updated: 2015-11-30Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Arvestad, Lars
By organisation
Numerical Analysis and Computer Science (NADA)Science for Life Laboratory (SciLifeLab)
In the same journal
Systematic Biology
Developmental BiologyComputer Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 26 hits
ReferencesLink to record
Permanent link

Direct link