Ändra sökning
Avgränsa sökresultatet
1 - 22 av 22
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Ali, Raja H.
    et al.
    Bark, Mikael
    Miró, Jorge
    Muhammad, Sayyed A.
    Sjöstrand, Joel
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    Zubair, Syed M.
    Abbas, Raja M.
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces2017Ingår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 18, artikel-id 97Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters.

    Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines.

    Conclusions: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket. org/rhali/visualmcmc/.

  • 2. Ali, Raja H.
    et al.
    Muhammad, Sayyed A.
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    GenFamClust: an accurate, synteny-aware and reliable homology inference algorithm2016Ingår i: BMC Evolutionary Biology, ISSN 1471-2148, E-ISSN 1471-2148, Vol. 16, artikel-id 120Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: Homology inference is pivotal to evolutionary biology and is primarily based on significant sequence similarity, which, in general, is a good indicator of homology. Algorithms have also been designed to utilize conservation in gene order as an indication of homologous regions. We have developed GenFamClust, a method based on quantification of both gene order conservation and sequence similarity. Results: In this study, we validate GenFamClust by comparing it to well known homology inference algorithms on a synthetic dataset. We applied several popular clustering algorithms on homologs inferred by GenFamClust and other algorithms on a metazoan dataset and studied the outcomes. Accuracy, similarity, dependence, and other characteristics were investigated for gene families yielded by the clustering algorithms. GenFamClust was also applied to genes from a set of complete fungal genomes and gene families were inferred using clustering. The resulting gene families were compared with a manually curated gold standard of pillars from the Yeast Gene Order Browser. We found that the gene-order component of GenFamClust is simple, yet biologically realistic, and captures local synteny information for homologs. Conclusions: The study shows that GenFamClust is a more accurate, informed, and comprehensive pipeline to infer homologs and gene families than other commonly used homology and gene-family inference methods.

  • 3. Ali, Raja Hashim
    et al.
    Muhammad, Sayyed Auwn
    Khan, Mehmood Alam
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden .
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013Ingår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, nr Suppl,15, s. S12-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background

    Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

    Results

    Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

    Conclusions

    The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 4. Angleby, Helen
    et al.
    Oskarsson, Mattias
    Pang, Junfeng
    Zhang, Ya-ping
    Leitner, Thomas
    Braham, Caitlyn
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Lundeberg, Joakim
    Webb, Kristen M.
    Savolainen, Peter
    Forensic Informativity of similar to 3000bp of Coding Sequence of Domestic Dog mtDNA2014Ingår i: Journal of Forensic Sciences, ISSN 0022-1198, E-ISSN 1556-4029, Vol. 59, nr 4, s. 898-908Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The discriminatory power of the noncoding control region (CR) of domestic dog mitochondrial DNA alone is relatively low. The extent to which the discriminatory power could be increased by analyzing additional highly variable coding regions of the mitochondrial genome (mtGenome) was therefore investigated. Genetic variability across the mtGenome was evaluated by phylogenetic analysis, and the three most variable similar to 1kb coding regions identified. We then sampled 100 Swedish dogs to represent breeds in accordance with their frequency in the Swedish population. A previously published dataset of 59 dog mtGenomes collected in the United States was also analyzed. Inclusion of the three coding regions increased the exclusion capacity considerably for the Swedish sample, from 0.920 for the CR alone to 0.964 for all four regions. The number of mtDNA types among all 159 dogs increased from 41 to 72, the four most frequent CR haplotypes being resolved into 22 different haplotypes.

  • 5.
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen. Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-science Research Centre, Sweden.
    alv: a console-based viewer for molecular sequence alignments2018Ingår i: Journal of Open Source Software, E-ISSN 2475-9066, Vol. 3, nr 31, artikel-id 955Artikel i tidskrift (Refereegranskat)
    Ladda ner fulltext (pdf)
    fulltext
  • 6. Duchemin, Wandrille
    et al.
    Gence, Guillaume
    Chifolleau, Anne-Muriel Arigon
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen. Swedish e-Science Research Centre (SeRC), Sweden.
    Bansal, Mukul S.
    Berry, Vincent
    Boussau, Bastien
    Chevenet, Francois
    Comte, Nicolas
    Davin, Adrian A.
    Dessimoz, Christophe
    Dylus, David
    Hasic, Damir
    Mallo, Diego
    Planel, Remi
    Posada, David
    Scornavacca, Celine
    Szollosi, Gergely
    Zhang, Louxin
    Tannier, Eric
    Daubin, Vincent
    RecPhyloXML: a format for reconciled gene trees2018Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, nr 21, s. 3646-3652Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Motivation: A reconciliation is an annotation of the nodes of a gene tree with evolutionary events-for example, speciation, gene duplication, transfer, loss, etc. -along with a mapping onto a species tree. Many algorithms and software produce or use reconciliations but often using different reconciliation formats, regarding the type of events considered or whether the species tree is dated or not. This complicates the comparison and communication between different programs. Results: Here, we gather a consortium of software developers in gene tree species tree reconciliation to propose and endorse a format that aims to promote an integrative-albeit flexible-specification of phylogenetic reconciliations. This format, named recPhyloXML, is accompanied by several tools such as a reconciled tree visualizer and conversion utilities.

  • 7. Emanuelsson, Olof
    et al.
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Käll, Lukas
    Engagera och aktivera studenter med inspiration från konferenser: examination genom poster-presentation2014Ingår i: Proceedings 2014: 8:e Pedagogiska inspirationskonferensen 17 december 2014, Lund: Lund University , 2014Konferensbidrag (Refereegranskat)
    Abstract [sv]

    I en forskningsnära kurs om 7.5 hp på master-nivå inom bioinformatikämnet vid KTH består drygt halva kursen av ett projekt som genomförs i grupper om tre studenter. Varje projekt har en egen projektuppgift med inget eller marginellt överlapp med andra gruppers uppgifter. Projekten är så gott som uteslutande baserade på aktuella frågeställningar i lärarteamets egna forskningsgrupper eller deras närhet. Projektet redovisas dels genom en posterpresentation, dels med individuell webbaserad projektdagbok. Vid posterredovisningen, som omfattar tre timmar i slutet av tentamensperioden, är alla kursdeltagare med. Vi försöker i möjligaste mån efterlikna situationen där ett autentiskt forskningsresultat presenteras på en riktig konferens. Varje deltagare (student) förväntas alltså ta del av varje annan grupps poster, på samma sätt som sker vid de flesta vetenskapliga konferenser. Vi genomför en enklare kamratbedömning på posternivå, där varje student ska avge en kort och konfidentiell kommentar om var och en av övriga postrar. Kursens lärare bedömer förstås också postrarna. En av svårigheterna är att sätta individuella betyg. Här använder vi oss av individuella projektdagböcker, som ger vägledning till de olika individernas insatser inom projektet. Vi har provat detta under fyra kursomgångar med som mest sju projekt. Examinationsformen är rolig och motiverande både för studenterna och lärarna.

  • 8.
    Ersmark, Erik
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Zoologiska institutionen. Swedish Museum of Natural History, Sweden.
    Klütsch, Cornelya
    Chan, Yvonne
    Dalén, Love
    Stockholms universitet, Naturvetenskapliga fakulteten, Zoologiska institutionen.
    Sinding-Larsen, Mikkel
    Gilbert, Thomas
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Fain, Steven R.
    Illarionova, Natalia
    Oskarsson, Mattias
    Uhlén, Mathias
    Zhang, Ya-Ping
    Savolainen, Peter
    From the past to the present: Wolf phylogeography and demographic history based on the mitochondrial control regionManuskript (preprint) (Övrigt vetenskapligt)
  • 9.
    Kahles, André
    et al.
    Kungliga Tekniska Högskolan.
    Sarqume, Fahad
    Kungliga Tekniska Högskolan.
    Savolainen, Peter
    Kungliga Tekniska Högskolan.
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Kungliga Tekniska Högskolan.
    Excap: maximization of haplotypic diversity of linked markers.2013Ingår i: PLOS ONE, E-ISSN 1932-6203, Vol. 8, nr 11, s. e79012-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Genetic markers, defined as variable regions of DNA, can be utilized for distinguishing individuals or populations. As long as markers are independent, it is easy to combine the information they provide. For nonrecombinant sequences like mtDNA, choosing the right set of markers for forensic applications can be difficult and requires careful consideration. In particular, one wants to maximize the utility of the markers. Until now, this has mainly been done by hand. We propose an algorithm that finds the most informative subset of a set of markers. The algorithm uses a depth first search combined with a branch-and-bound approach. Since the worst case complexity is exponential, we also propose some data-reduction techniques and a heuristic. We implemented the algorithm and applied it to two forensic caseworks using mitochondrial DNA, which resulted in marker sets with significantly improved haplotypic diversity compared to previous suggestions. Additionally, we evaluated the quality of the estimation with an artificial dataset of mtDNA. The heuristic is shown to provide extensive speedup at little cost in accuracy.

    Ladda ner fulltext (pdf)
    fulltext
  • 10. Khan, Mehmood Alam
    et al.
    Elias, Isaac
    Sjölund, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Nylander, Kristina
    Guimera, Roman Valls
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Schobesberger, Richard
    Schmitzberger, Peter
    Lagergren, Jens
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Fastphylo: Fast tools for phylogenetics2013Ingår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, s. 334-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    BACKGROUND: Distance methods are ubiquitous tools in phylogenetics.Their primary purpose may be to reconstructevolutionary history, but they are also used as components in bioinformatic pipelines. However, poorcomputational efficiency has been a constraint on the applicability of distance methods on very largeproblem instances.

    RESULTS: We present fastphylo, a software package containing implementations of efficient algorithms for twocommon problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing aphylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methodsand report the results in terms of speed and memory efficiency.

    CONCLUSIONS: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture,fastphylo is a flexible tool for many phylogenetic studies.

    Ladda ner fulltext (pdf)
    fulltext
  • 11. Khan, Mehmood Alam
    et al.
    Mahmudi, Owais
    Ullah, Ikram
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    Lagergren, Jens
    Probabilistic inference of lateral gene transfer events2016Ingår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 17, nr Suppl 14, artikel-id 431Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge.

    Results: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify highways of LGT.

    Conclusions: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.

  • 12. Klinter, Stefan
    et al.
    Bulone, Vincent
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Diversity and evolution of chitin synthases in oomycetes (Straminipila: Oomycota)2019Ingår i: Molecular Phylogenetics and Evolution, ISSN 1055-7903, E-ISSN 1095-9513, Vol. 139, artikel-id 106558Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The oomycetes are filamentous eukaryotic microorganisms, distinct from true fungi, many of which act as crop or fish pathogens that cause devastating losses in agriculture and aquaculture. Chitin is present in all true fungi, but it occurs in only small amounts in some Saprolegniomycetes and it is absent in Peronosporomycetes. However, the growth of several oomycetes is severely impacted by competitive chitin synthase (CHS) inhibitors. Here, we shed light on the diversity, evolution and function of oomycete CHS proteins. We show by phylogenetic analysis of 93 putative CHSs from 48 highly diverse oomycetes, including the early diverging Ewychasma dicksonii, that all available oomycete genomes contain at least one putative CHS gene. All gene products contain conserved CHS motifs essential for enzymatic activity and form two Peronosporomycete-specific and six Saprolegniale-specific clades. Proteins of all clades, except one, contain an N-terminal microtubule interacting and trafficking (MIT) domain as predicted by protein domain databases or manual analysis, which is supported by homology modelling and comparison of conserved structural features from sequence logos. We identified at least three groups of CHSs conserved among all oomycete lineages and used phylogenetic reconciliation analysis to infer the dynamic evolution of CHSs in oomycetes. The evolutionary aspects of CHS diversity in modern-day oomycetes are discussed. In addition, we observed hyphal tip rupture in Phytophthora infestans upon treatment with the CHS inhibitor nikkomycin Z. Combining data on phylogeny, gene expression, and response to CHS inhibitors, we propose the association of different CHS clades with certain developmental stages.

  • 13. Mahmudi, Owais
    et al.
    Sennblad, Bengt
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Nowick, Katja
    Lagergren, Jens
    Gene-pseudogene evolution: a probabilistic approach2015Ingår i: BMC Genomics, ISSN 1471-2164, E-ISSN 1471-2164, Vol. 16, artikel-id S12Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Over the last decade, methods have been developed for the reconstruction of gene trees that take into account the species tree. Many of these methods have been based on the probabilistic duplication-loss model, which describes how a gene-tree evolves over a species-tree with respect to duplication and losses, as well as extension of this model, e.g., the DLRS (Duplication, Loss, Rate and Sequence evolution) model that also includes sequence evolution under relaxed molecular clock. A disjoint, almost as recent, and very important line of research has been focused on non protein-coding, but yet, functional DNA. For instance, DNA sequences being pseudogenes in the sense that they are not translated, may still be transcribed and the thereby produced RNA may be functional. We extend the DLRS model by including pseudogenization events and devise an MCMC framework for analyzing extended gene families consisting of genes and pseudogenes with respect to this model, i.e., reconstructing gene-trees and identifying pseudogenization events in the reconstructed gene-trees. By applying the MCMC framework to biologically realistic synthetic data, we show that gene-trees as well as pseudogenization points can be inferred well. We also apply our MCMC framework to extended gene families belonging to the Olfactory Receptor and Zinc Finger superfamilies. The analysis indicate that both these super families contains very old pseudogenes, perhaps so old that it is reasonable to suspect that some are functional. In our analysis, the sub families of the Olfactory Receptors contains only lineage specific pseudogenes, while the sub families of the Zinc Fingers contains pseudogene lineages common to several species.

  • 14.
    Nystedt, Björn
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sherwood, Ellen
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Jansson, Stefan
    The Norway spruce genome sequence and conifer genome evolution2013Ingår i: Nature, ISSN 0028-0836, E-ISSN 1476-4687, Vol. 497, nr 7451, s. 579-584Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Conifers have dominated forests for more than 200 million years and are of huge ecological and economic importance. Here we present the draft assembly of the 20-gigabase genome of Norway spruce (Picea abies), the first available for any gymnosperm. The number of well-supported genes (28,354) is similar to the >100 times smaller genome of Arabidopsis thaliana, and there is no evidence of a recent whole-genome duplication in the gymnosperm lineage. Instead, the large genome size seems to result from the slow and steady accumulation of a diverse set of long-terminal repeat transposable elements, possibly owing to the lack of an efficient elimination mechanism. Comparative sequencing of Pinus sylvestris, Abies sibirica, Juniperus communis, Taxus baccata and Gnetum gnemon reveals that the transposable element diversity is shared among extant conifers. Expression of 24-nucleotide small RNAs, previously implicated in transposable element silencing, is tissue-specific and much lower than in other plants. We further identify numerous long (>10,000 base pairs) introns, gene-like fragments, uncharacterized long non-coding RNAs and short RNAs. This opens up new genomic avenues for conifer forestry and breeding.

  • 15. Sahlin, Kristoffer
    et al.
    Chikhi, Rayan
    Arvestad, Lars
    Stockholms universitet, Science for Life Laboratory (SciLifeLab). Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Swedish e-Science Research Centre, Sweden.
    Assembly scaffolding with PE-contaminated mate-pair libraries2016Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, nr 13, s. 1925-1932Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Motivation: Scaffolding is often an essential step in a genome assembly process, in which contigs are ordered and oriented using read pairs from a combination of paired-end libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problems is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed before, in relation to integrated scaffolders, but solutions rely on the orientation being observable, e.g. by finding the junction adapter sequence in the reads. This is not always possible, making orientation and insert size of a read pair stochastic. To our knowledge, there is neither previous work on modeling PE-contamination, nor a study on the effect PE-contamination has on scaffolding quality. Results: We have addressed PE-contamination in an update to our scaffolder BESST. We formulate the problem as an integer linear program which is solved using an efficient heuristic. The new method shows significant improvement over both integrated and stand-alone scaffolders in our experiments. The impact of modeling PE-contamination is quantified by comparing with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in an increased number of misassemblies, more conservative scaffolding and inflated assembly sizes.

  • 16. Sahlin, Kristoffer
    et al.
    Street, Nathaniel
    Lundeberg, Joakim
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Improved gap size estimation for scaffolding algorithms2012Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, nr 17, s. 2215-2222Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.

    Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.

  • 17. Sahlin, Kristoffer
    et al.
    Vezzi, Francesco
    Nystedt, Björn
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Lundeberg, Joakim
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    BESST - Efficient scaffolding of large fragmented assemblies2014Ingår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 15, artikel-id 281Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background

    The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.

    We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software’s general performance. 

    Results

    We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide.

    Conclusion

    We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding. 

    Ladda ner fulltext (pdf)
    fulltext
  • 18.
    Sjöstrand, Joel
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Lagergren, Jens
    Sennblad, Bengt
    GenPhyloData: realistic simulation of gene family evolution2013Ingår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, artikel-id 209Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and-perhaps more interestingly-also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. Result: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. Conclusion: The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

  • 19.
    Sjöstrand, Joel
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Sennblad, Bengt
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Lagergren, Jens
    DLRS: Gene tree evolution in light of a species tree2012Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, nr 22, s. 2994-2995Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters.

  • 20.
    Sjöstrand, Joel
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Tofigh, Ali
    Daubin, Vincent
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA). Stockholms universitet, Science for Life Laboratory (SciLifeLab). KTH Royal Institute of Technology, Sweden.
    Sennblad, Bengt
    Lagergren, Jens
    A Bayesian Method for Analyzing Lateral Gene Transfer2014Ingår i: Systematic Biology, ISSN 1063-5157, E-ISSN 1076-836X, Vol. 63, nr 3, s. 409-420Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Lateral gene transfer (LGT)uwhich transfers DNA between two non-vertically related individuals belonging to the same or different speciesuis recognized as a major force in prokaryotic evolution, and evidence of its impact on eukaryotic evolution is ever increasing. LGT has attracted much public attention for its potential to transfer pathogenic elements and antibiotic resistance in bacteria, and to transfer pesticide resistance from genetically modified crops to other plants. In a wider perspective, there is a growing body of studies highlighting the role of LGT in enabling organisms to occupy new niches or adapt to environmental changes. The challenge LGT poses to the standard tree-based conception of evolution is also being debated. Studies of LGT have, however, been severely limited by a lack of computational tools. The best currently available LGT algorithms are parsimony-based phylogenetic methods, which require a pre-computed gene tree and cannot choose between sometimes wildly differing most parsimonious solutions. Moreover, in many studies, simple heuristics are applied that can only handle putative orthologs and completely disregard gene duplications (GDs). Consequently, proposed LGT among specific gene families, and the rate of LGT in general, remain debated. We present a Bayesian Markov-chain Monte Carlo-based method that integrates GD, gene loss, LGT, and sequence evolution, and apply the method in a genome-wide analysis of two groups of bacteria: Mollicutes and Cyanobacteria. Our analyses show that although the LGT rate between distant species is high, the net combined rate of duplication and close-species LGT is on average higher. We also show that the common practice of disregarding reconcilability in gene tree inference overestimates the number of LGT and duplication events. [Bayesian; gene duplication; gene loss; horizontal gene transfer; lateral gene transfer; MCMC; phylogenetics.].

  • 21. Sullivan, Alexis R.
    et al.
    Eldfjell, Yrin
    Stockholms universitet, Science for Life Laboratory (SciLifeLab). Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen. Swedish e-Science Research Centre, Sweden.
    Schiffthaler, Bastian
    Delhomme, Nicolas
    Asp, Torben
    Hebelstrup, Kim H.
    Keech, Olivier
    Öberg, Lisa
    Møller, Ian Max
    Arvestad, Lars
    Stockholms universitet, Science for Life Laboratory (SciLifeLab). Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen. Swedish e-Science Research Centre, Sweden.
    Street, Nathaniel R.
    Wang, Xiao-Ru
    The Mitogenome of Norway Spruce and a Reappraisal of Mitochondrial Recombination in Plants2020Ingår i: Genome Biology and Evolution, ISSN 1759-6653, E-ISSN 1759-6653, Vol. 12, nr 1, s. 3586-3598Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Plant mitogenomes can be difficult to assemble because they are structurally dynamic and prone to intergenomic DNA transfers, leading to the unusual situation where an organelle genome is far outnumbered by its nuclear counterparts. As a result, comparative mitogenome studies are in their infancy and some key aspects of genome evolution are still known mainly from pregenomic, qualitative methods. To help address these limitations, we combined machine learning and in silico enrichment of mitochondrial-like long reads to assemble the bacterial-sized mitogenome of Norway spruce (Pinaceae: Picea abies). We conducted comparative analyses of repeat abundance, intergenomic transfers, substitution and rearrangement rates, and estimated repeat-by-repeat homologous recombination rates. Prompted by our discovery of highly recombinogenic small repeats in P. abies, we assessed the genomic support for the prevailing hypothesis that intramolecular recombination is predominantly driven by repeat length, with larger repeats facilitating DNA exchange more readily. Overall, we found mixed support for this view: Recombination dynamics were heterogeneous across vascular plants and highly active small repeats (ca. 200 bp) were present in about one-third of studied mitogenomes. As in previous studies, we did not observe any robust relationships among commonly studied genome attributes, but we identify variation in recombination rates as a underinvestigated source of plant mitogenome diversity.

  • 22. Vicedomini, Riccardo
    et al.
    Vezzi, Francesco
    Scalabrin, Simone
    Arvestad, Lars
    Stockholms universitet, Naturvetenskapliga fakulteten, Numerisk analys och datalogi (NADA).
    Policriti, Alberto
    GAM-NGS: genomic assemblies merger for next generation sequencing2013Ingår i: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 14, nr Suppl.7, s. S6-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. Results: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. Conclusions: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.

1 - 22 av 22
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf