Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
GenPhyloData: realistic simulation of gene family evolution
Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
2013 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, 209Article in journal (Refereed) Published
Abstract [en]

Background: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and-perhaps more interestingly-also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. Result: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. Conclusion: The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

Place, publisher, year, edition, pages
2013. Vol. 14, 209
Keyword [en]
Phylogenetics, Synthetic data, Gene family, Gene duplication, Gene loss, LGT, Molecular clock, Biogeography, Host-parasite co-evolution
National Category
Biochemistry and Molecular Biology Microbiology
Identifiers
URN: urn:nbn:se:su:diva-92509DOI: 10.1186/1471-2105-14-209ISI: 000321381300001OAI: oai:DiVA.org:su-92509DiVA: diva2:639770
Note

AuthorCount:4;

Available from: 2013-08-09 Created: 2013-08-07 Last updated: 2017-12-06Bibliographically approved
In thesis
1. Reconciling gene family evolution and species evolution
Open this publication in new window or tab >>Reconciling gene family evolution and species evolution
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Species evolution can often be adequately described with a phylogenetic tree. Interestingly, this is the case also for the evolution of homologous genes; a gene in an ancestral species may – through gene duplication, gene loss, lateral gene transfer (LGT), and speciation events – give rise to a gene family distributed across contemporaneous species. However, molecular sequence evolution and genetic recombination make the history – the gene tree – non-trivial to reconstruct from present-day sequences. This history is of biological interest, e.g., for inferring potential functional equivalences of extant gene pairs.

In this thesis, we present biologically sound probabilistic models for gene family evolution guided by species evolution – effectively yielding a gene-species tree reconciliation. Using Bayesian Markov-chain Monte Carlo (MCMC) inference techniques, we show that by taking advantage of the information provided by the species tree, our methods achieve more reliable gene tree estimates than traditional species tree-uninformed approaches.

Specifically, we describe a comprehensive model that accounts for gene duplication, gene loss, a relaxed molecular clock, and sequence evolution, and we show that the method performs admirably on synthetic and biological data. Further-more, we present two expansions of the inference procedure, enabling it to pro-vide (i) refined gene tree estimates with timed duplications, and (ii) probabilistic orthology estimates – i.e., that the origin of a pair of extant genes is a speciation.

Finally, we present a substantial development of the model to account also for LGT. A sophisticated algorithmic framework of dynamic programming and numerical methods for differential equations is used to resolve the computational hurdles that LGT brings about. We apply the method on two bacterial datasets where LGT is believed to be prominent, in order to estimate genome-wide LGT and duplication rates. We further show that traditional methods – in which gene trees are reconstructed and reconciled with the species tree in separate stages – are prone to yield inferior gene tree estimates that will overestimate the number of LGT events.

Abstract [sv]

Arters evolution kan i många fall beskrivas med ett träd, vilket redan Darwins anteckningsböcker från HMS Beagle vittnar om. Detta gäller också homologa gener; en gen i en ancestral art kan – genom genduplikationer, genförluster, lateral gentransfer (LGT) och artbildningar – ge upphov till en genfamilj spridd över samtida arter. Att från sekvenser från nu levande arter rekonstruera genfamiljens framväxt – genträdet – är icke-trivialt på grund av genetisk rekombination och sekvensevolution. Genträdet är emellertid av biologiskt intresse, i synnerhet för att det möjliggör antaganden om funktionellt släktskap mellan nutida genpar.

Denna avhandling behandlar biologiskt välgrundade sannolikhetsmodeller för genfamiljsevolution. Dessa modeller tar hjälp av artevolutionens starka inverkan på genfamiljens historia, och ger väsentligen upphov till en förlikning av genträd och artträd. Genom Bayesiansk inferens baserad på Markov-chain Monte Carlo (MCMC) visar vi att våra metoder presterar bättre genträdsskattningar än traditionella ansatser som inte tar artträdet i beaktning.

Mer specifikt beskriver vi en modell som omfattar genduplikationer, genförluster, en relaxerad molekylär klocka, samt sekvensevolution, och visar att metoden ger högkvalitativa skattningar på både syntetiska och biologiska data. Vidare presenterar vi två utvidgningar av detta ramverk som möjliggör (i) genträdsskattningar med tidpunkter för duplikationer, samt (ii) probabilistiska ortologiskattningar – d.v.s. att två nutida gener härstammar från en artbildning.

Slutligen presenterar vi en modell som inkluderar LGT utöver ovan nämnda mekanismer. De beräkningsmässiga svårigheter som LGT ger upphov till löses med ett intrikat ramverk av dynamisk programmering och numeriska metoder för differentialekvationer. Vi tillämpar metoden för att skatta LGT- och duplikationsraten hos två bakteriella dataset där LGT förmodas ha spelat en central roll. Vi visar också att traditionella metoder – där genträd skattas och förlikas med artträdet i separata steg – tenderar att ge sämre genträdsskattningar, och därmed överskatta antalet LGT-händelser.

Place, publisher, year, edition, pages
Stockholm: Numerical Analysis and Computer Science (NADA), Stockholm University, 2013. 59 p.
Keyword
Computational biology, Bioinformatics, Phylogenetics, Phylogenomics, Comparative genomics, Evolutionary biology
National Category
Computer Science
Research subject
Computer Science
Identifiers
urn:nbn:se:su:diva-93346 (URN)978-91-7447-760-3 (ISBN)
Public defence
2013-11-04, Inghesalen, Widerströmska huset, Karolinska Institutet, Tomtebodavägen 18, Solna, 13:30 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 5: Manuscript.

Available from: 2013-10-13 Created: 2013-09-09 Last updated: 2015-11-30Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Arvestad, Lars
By organisation
Numerical Analysis and Computer Science (NADA)Science for Life Laboratory (SciLifeLab)
In the same journal
BMC Bioinformatics
Biochemistry and Molecular BiologyMicrobiology

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 58 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf