Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Genome-wide probabilistic reconciliation analysis across vertebrates
Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).ORCID iD: 0000-0002-6470-0239
2013 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, no Suppl 15, S10- p.Article in journal (Refereed) Published
Abstract [en]

Gene duplication is considered to be a major driving force in evolution that enables the genome of a species to acquire new functions. A reconciliation - a mapping of gene tree vertices to the edges or vertices of a species tree - explains where gene duplications have occurred on the species tree. In this study, we sample reconciliations from a posterior over reconciliations, gene trees, edge lengths and other parameters, given a species tree and gene sequences. We employ a Bayesian analysis tool, based on the probabilistic model DLRS that integrates gene duplication, gene loss and sequence evolution under a relaxed molecular clock for substitution rates, to obtain this posterior.

By applying these methods, we perform a genome-wide analysis of a nine species dataset, OPTIC, and conclude that for many gene families, the most parsimonious reconciliation (MPR) - a reconciliation that minimizes the number of duplications - is far from the correct explanation of the evolutionary history. For the given dataset, we observe that approximately 19% of the sampled reconciliations are different from MPR. This is in clear contrast with previous estimates, based on simpler models and less realistic assumptions, according to which 98% of the reconciliations can be expected to be identical to MPR. We also generate heatmaps showing where in the species trees duplications have been most frequent during the evolution of these species.

Place, publisher, year, edition, pages
BioMed Central, 2013. Vol. 14, no Suppl 15, S10- p.
National Category
Biochemistry and Molecular Biology Microbiology Mathematical Analysis
Identifiers
URN: urn:nbn:se:su:diva-94358DOI: 10.1186/1471-2105-14-S15-S10ISI: 000328316700010OAI: oai:DiVA.org:su-94358DiVA: diva2:653323
Conference
11th Annual Research in Computational Molecular Biology (RECOMB) Satellite Workshop on Comparative GenomicsLyon, FRANCE, OCT 17-19, 2013
Note

AuthorCount: 4;

Available from: 2013-10-03 Created: 2013-10-03 Last updated: 2017-12-06Bibliographically approved
In thesis
1. Reconciling gene family evolution and species evolution
Open this publication in new window or tab >>Reconciling gene family evolution and species evolution
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Species evolution can often be adequately described with a phylogenetic tree. Interestingly, this is the case also for the evolution of homologous genes; a gene in an ancestral species may – through gene duplication, gene loss, lateral gene transfer (LGT), and speciation events – give rise to a gene family distributed across contemporaneous species. However, molecular sequence evolution and genetic recombination make the history – the gene tree – non-trivial to reconstruct from present-day sequences. This history is of biological interest, e.g., for inferring potential functional equivalences of extant gene pairs.

In this thesis, we present biologically sound probabilistic models for gene family evolution guided by species evolution – effectively yielding a gene-species tree reconciliation. Using Bayesian Markov-chain Monte Carlo (MCMC) inference techniques, we show that by taking advantage of the information provided by the species tree, our methods achieve more reliable gene tree estimates than traditional species tree-uninformed approaches.

Specifically, we describe a comprehensive model that accounts for gene duplication, gene loss, a relaxed molecular clock, and sequence evolution, and we show that the method performs admirably on synthetic and biological data. Further-more, we present two expansions of the inference procedure, enabling it to pro-vide (i) refined gene tree estimates with timed duplications, and (ii) probabilistic orthology estimates – i.e., that the origin of a pair of extant genes is a speciation.

Finally, we present a substantial development of the model to account also for LGT. A sophisticated algorithmic framework of dynamic programming and numerical methods for differential equations is used to resolve the computational hurdles that LGT brings about. We apply the method on two bacterial datasets where LGT is believed to be prominent, in order to estimate genome-wide LGT and duplication rates. We further show that traditional methods – in which gene trees are reconstructed and reconciled with the species tree in separate stages – are prone to yield inferior gene tree estimates that will overestimate the number of LGT events.

Abstract [sv]

Arters evolution kan i många fall beskrivas med ett träd, vilket redan Darwins anteckningsböcker från HMS Beagle vittnar om. Detta gäller också homologa gener; en gen i en ancestral art kan – genom genduplikationer, genförluster, lateral gentransfer (LGT) och artbildningar – ge upphov till en genfamilj spridd över samtida arter. Att från sekvenser från nu levande arter rekonstruera genfamiljens framväxt – genträdet – är icke-trivialt på grund av genetisk rekombination och sekvensevolution. Genträdet är emellertid av biologiskt intresse, i synnerhet för att det möjliggör antaganden om funktionellt släktskap mellan nutida genpar.

Denna avhandling behandlar biologiskt välgrundade sannolikhetsmodeller för genfamiljsevolution. Dessa modeller tar hjälp av artevolutionens starka inverkan på genfamiljens historia, och ger väsentligen upphov till en förlikning av genträd och artträd. Genom Bayesiansk inferens baserad på Markov-chain Monte Carlo (MCMC) visar vi att våra metoder presterar bättre genträdsskattningar än traditionella ansatser som inte tar artträdet i beaktning.

Mer specifikt beskriver vi en modell som omfattar genduplikationer, genförluster, en relaxerad molekylär klocka, samt sekvensevolution, och visar att metoden ger högkvalitativa skattningar på både syntetiska och biologiska data. Vidare presenterar vi två utvidgningar av detta ramverk som möjliggör (i) genträdsskattningar med tidpunkter för duplikationer, samt (ii) probabilistiska ortologiskattningar – d.v.s. att två nutida gener härstammar från en artbildning.

Slutligen presenterar vi en modell som inkluderar LGT utöver ovan nämnda mekanismer. De beräkningsmässiga svårigheter som LGT ger upphov till löses med ett intrikat ramverk av dynamisk programmering och numeriska metoder för differentialekvationer. Vi tillämpar metoden för att skatta LGT- och duplikationsraten hos två bakteriella dataset där LGT förmodas ha spelat en central roll. Vi visar också att traditionella metoder – där genträd skattas och förlikas med artträdet i separata steg – tenderar att ge sämre genträdsskattningar, och därmed överskatta antalet LGT-händelser.

Place, publisher, year, edition, pages
Stockholm: Numerical Analysis and Computer Science (NADA), Stockholm University, 2013. 59 p.
Keyword
Computational biology, Bioinformatics, Phylogenetics, Phylogenomics, Comparative genomics, Evolutionary biology
National Category
Computer Science
Research subject
Computer Science
Identifiers
urn:nbn:se:su:diva-93346 (URN)978-91-7447-760-3 (ISBN)
Public defence
2013-11-04, Inghesalen, Widerströmska huset, Karolinska Institutet, Tomtebodavägen 18, Solna, 13:30 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 5: Manuscript.

Available from: 2013-10-13 Created: 2013-09-09 Last updated: 2015-11-30Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Sjöstrand, Joel
By organisation
Numerical Analysis and Computer Science (NADA)Science for Life Laboratory (SciLifeLab)
In the same journal
BMC Bioinformatics
Biochemistry and Molecular BiologyMicrobiologyMathematical Analysis

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 76 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf