Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bayesian Phylogenetic Inference: Estimating Diversification Rates from Reconstructed Phylogenies
Stockholm University, Faculty of Science, Department of Mathematics.
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Phylogenetics is the study of the evolutionary relationship between species. Inference of phylogeny relies heavily on statistical models that have been extended and refined tremendously over the past years into very complex hierarchical models. Paper I introduces probabilistic graphical models to statistical phylogenetics and elaborates on the potential advantages a unified graphical model representation could have for the community, e.g., by facilitating communication and improving reproducibility of statistical analyses of phylogeny and evolution.

Once the phylogeny is reconstructed it is possible to infer the rates of diversification (speciation and extinction). In this thesis I extend the birth-death process model, so that it can be applied to incompletely sampled phylogenies, that is, phylogenies of only a subsample of the presently living species from one group. Previous work only considered the case when every species had the same probability to be included and here I examine two alternative sampling schemes: diversified taxon sampling and cluster sampling. Paper II introduces these sampling schemes under a constant rate birth-death process and gives the probability density for reconstructed phylogenies. These models are extended in Paper IV to time-dependent diversification rates, again, under different sampling schemes and applied to empirical phylogenies. Paper III focuses on fast and unbiased simulations of reconstructed phylogenies. The efficiency is achieved by deriving the analytical distribution and density function of the speciation times in the reconstructed phylogeny.

Place, publisher, year, edition, pages
Stockholm: Department of Mathematics, Stockholm University , 2013. , 26 p.
Keyword [en]
Phylogenetics, Bayesian inference, Graphical Models, Birth-Death Process, Diversification
National Category
Evolutionary Biology Mathematics
Research subject
Mathematical Statistics
Identifiers
URN: urn:nbn:se:su:diva-95361ISBN: 978-91-7447-771-9 (print)OAI: oai:DiVA.org:su-95361DiVA: diva2:659516
Public defence
2013-11-29, sal 14, hus 5, Kräftriket, Roslagsvägen 101, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 1: Manuscript. Paper 4: Accepted.

Available from: 2013-11-07 Created: 2013-10-25 Last updated: 2015-03-10Bibliographically approved
List of papers
1. Probabilistic Graphical Model Representation in Phylogenetics
Open this publication in new window or tab >>Probabilistic Graphical Model Representation in Phylogenetics
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Recent years have seen a rapid expansion of the model space explored in statistical phylogenetics, emphasizing the need for new approaches to model representation and software development. Clear communication and representation of the chosen model is crucial for: (1) reproducibility of an analysis, (2) model development and (3) software design. Moreover, a unified, clear and understandable framework formodel representation lowers the barrier for beginning scientists and non-specialists to grasp the model including the assumptions and parameter/variable dependencies.

Graphical models is such a unifying framework that has gained in popularity in the statistical literature in recent years. The core idea isto break complex models into conditionally independent distributions and the strength lies in, amongst others: comprehensibility, flexibility, adaptability and computational algorithms. Graphical models can be used to teach statistical models, to facilitate communication among phylogeneticists and in the development of generic software for simulation and statistical inference.

Here, we provide an introduction to graphical models for phylogeneticists and extend the standard graphical model representation to the realm of phylogenetics. We introduce a new graphical model component, tree plates, to capture the changing structure of the subgraph corresponding to a phylogenetic tree. We describe a range of phylogenetic models using the graphical model framework and built these into separate, interchangeable modules. Phylogenetic model graphs can be readily used in simulation, maximum likelihood inference, and Bayesian inference using either Metropolis-Hastings or Gibbs sampling of the posterior distribution.

National Category
Evolutionary Biology Mathematics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-95360 (URN)
Available from: 2013-10-25 Created: 2013-10-25 Last updated: 2013-10-28Bibliographically approved
2. Inferring Speciation and Extinction Rates under Different Sampling Schemes
Open this publication in new window or tab >>Inferring Speciation and Extinction Rates under Different Sampling Schemes
2011 (English)In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 28, no 9, 2577-2589 p.Article in journal (Refereed) Published
Abstract [en]

The birth-death process is widely used in phylogenetics to model speciation and extinction. Recent studies have shown that the inferred rates are sensitive to assumptions about the sampling probability of lineages. Here, we examine the effect of the method used to sample lineages. Whereas previous studies have assumed random sampling (RS), we consider two extreme cases of biased sampling: diversified sampling (DS), where tips are selected to maximize diversity and cluster sampling (CS), where sample diversity is minimized. DS appears to be standard practice, for example, in analyses of higher taxa, whereas CS may occur under special circumstances, for example, in studies of geographically defined floras or faunas. Using both simulations and analyses of empirical data, we show that inferred rates may be heavily biased if the sampling strategy is not modeled correctly. In particular, when a diversified sample is treated as if it were a random or complete sample, the extinction rate is severely underestimated, often close to 0. Such dramatic errors may lead to serious consequences, for example, if estimated rates are used in assessing the vulnerability of threatened species to extinction. Using Bayesian model testing across 18 empirical data sets, we show that DS is commonly a better fit to the data than complete, random, or cluster sampling (CS). Inappropriate modeling of the sampling method may at least partly explain anomalous results that have previously been attributed to variation over time in birth and death rates.

Keyword
birth and death process, speciation, extinction, phylogenetics, species tree, sampling, inference
National Category
Evolutionary Biology Mathematics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-68019 (URN)10.1093/molbev/msr095 (DOI)000294552700018 ()
Note

authorCount :4

Available from: 2012-01-03 Created: 2012-01-02 Last updated: 2017-12-08Bibliographically approved
3. Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes
Open this publication in new window or tab >>Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes
2013 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 11, 1367-1374 p.Article in journal (Refereed) Published
Abstract [en]

Motivation: Diversification rates and patterns may be inferred from reconstructed phylogenies. Both the time-dependent and the diversity-dependent birthdeath process can produce the same observed patterns of diversity over time. To develop and test new models describing the macro-evolutionary process of diversification, generic and fast algorithms to simulate under these models are necessary. Simulations are not only important for testing and developing models but play an influential role in the assessment of model fit.

Results: In the present article, I consider as the model a global time-dependent birthdeath process where each species has the same rates but rates may vary over time. For this model, I derive the likelihood of the speciation times from a reconstructed phylogenetic tree and show that each speciation event is independent and identically distributed. This fact can be used to simulate efficiently reconstructed phylogenetic trees when conditioning on the number of species, the time of the process or both. I show the usability of the simulation by approximating the posterior predictive distribution of a birthdeath process with decreasing diversification rates applied on a published bird phylogeny (family Cettiidae).

Availability: The methods described in this manuscript are implemented in the R package TESS, available from the repository CRAN (http://cran.r-project.org/web/packages/TESS/).

Keyword
Simulations, Birth-Death Process, Phylogenetics
National Category
Biochemistry and Molecular Biology Medical Biotechnology (with a focus on Cell Biology (including Stem Cell Biology), Molecular Biology, Microbiology, Biochemistry or Biopharmacy) Bioinformatics (Computational Biology)
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-91828 (URN)10.1093/bioinformatics/btt153 (DOI)000319428600002 ()
Available from: 2013-07-08 Created: 2013-07-04 Last updated: 2017-12-06Bibliographically approved
4. Likelihood Inference of Non-Constant Diversification Rates with Incomplete Taxon Sampling
Open this publication in new window or tab >>Likelihood Inference of Non-Constant Diversification Rates with Incomplete Taxon Sampling
2014 (English)In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 9, no 1, e84184Article in journal (Refereed) Published
Abstract [en]

Large-scale phylogenies provide a valuable source to study background diversification rates and investigate if the rates have changed over time. Unfortunately most large-scale, dated phylogenies are sparsely sampled (fewer than 5% of the described species) and taxon sampling is not uniform. Instead, taxa are frequently sampled to obtain at least one representative per subgroup (e. g. family) and thus to maximize diversity (diversified sampling). So far, such complications have been ignored, potentially biasing the conclusions that have been reached. In this study I derive the likelihood of a birth-death process with non-constant (time-dependent) diversification rates and diversified taxon sampling. Using simulations I test if the true parameters and the sampling method can be recovered when the trees are small or medium sized (fewer than 200 taxa). The results show that the diversification rates can be inferred and the estimates are unbiased for large trees but are biased for small trees (fewer than 50 taxa). Furthermore, model selection by means of Akaike's Information Criterion favors the true model if the true rates differ sufficiently from alternative models (e. g. the birth-death model is recovered if the extinction rate is large and compared to a pure-birth model). Finally, I applied six different diversification rate models - ranging from a constant-rate pure birth process to a decreasing speciation rate birth-death process but excluding any rate shift models - on three large-scale empirical phylogenies (ants, mammals and snakes with respectively 149, 164 and 41 sampled species). All three phylogenies were constructed by diversified taxon sampling, as stated by the authors. However only the snake phylogeny supported diversified taxon sampling. Moreover, a parametric bootstrap test revealed that none of the tested models provided a good fit to the observed data. The model assumptions, such as homogeneous rates across species or no rate shifts, appear to be violated.

National Category
Evolutionary Biology Mathematics
Research subject
Mathematical Statistics
Identifiers
urn:nbn:se:su:diva-100655 (URN)10.1371/journal.pone.0084184 (DOI)000329462700016 ()
Note

AuthorCount:1;

Available from: 2014-02-12 Created: 2014-02-10 Last updated: 2017-12-06Bibliographically approved

Open Access in DiVA

fulltext(1368 kB)361 downloads
File information
File name FULLTEXT01.pdfFile size 1368 kBChecksum SHA-512
e47af97cf99bc700ec269908b60ff1b6a8c2d432ba9b36c13cb1885cfdea929d2ecc2daa3f54d2cbde34e39d7a44933d350980cadb316a0131b5fe0d55051cd2
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Höhna, Sebastian
By organisation
Department of Mathematics
Evolutionary BiologyMathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 361 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 592 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf