Change search
Refine search result
1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Ali, Raja H.
    et al.
    Bark, Mikael
    Miró, Jorge
    Muhammad, Sayyed A.
    Sjöstrand, Joel
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    Zubair, Syed M.
    Abbas, Raja M.
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    VMCMC: a graphical and statistical analysis tool for Markov chain Monte Carlo traces2017In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 18, article id 97Article in journal (Refereed)
    Abstract [en]

    Background: MCMC-based methods are important for Bayesian inference of phylogeny and related parameters. Although being computationally expensive, MCMC yields estimates of posterior distributions that are useful for estimating parameter values and are easy to use in subsequent analysis. There are, however, sometimes practical difficulties with MCMC, relating to convergence assessment and determining burn-in, especially in large-scale analyses. Currently, multiple software are required to perform, e.g., convergence, mixing and interactive exploration of both continuous and tree parameters.

    Results: We have written a software called VMCMC to simplify post-processing of MCMC traces with, for example, automatic burn-in estimation. VMCMC can also be used both as a GUI-based application, supporting interactive exploration, and as a command-line tool suitable for automated pipelines.

    Conclusions: VMCMC is a free software available under the New BSD License. Executable jar files, tutorial manual and source code can be downloaded from https://bitbucket. org/rhali/visualmcmc/.

  • 2. Ali, Raja Hashim
    et al.
    Muhammad, Sayyed Auwn
    Khan, Mehmood Alam
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden .
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, no Suppl,15, p. S12-Article in journal (Refereed)
    Abstract [en]

    Background

    Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

    Results

    Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

    Conclusions

    The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 3.
    Ensterö, Mats
    et al.
    Stockholm University, Faculty of Science, Department of Molecular Biology and Functional Genomics.
    Åkerborg, Örjan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lundin, Daniel
    Stockholm University, Faculty of Science, Department of Molecular Biology and Functional Genomics.
    Wang, Bei
    Furey, Terrence S
    Öhman, Marie
    Stockholm University, Faculty of Science, Department of Molecular Biology and Functional Genomics.
    Lagergren, Jens
    A computational screen for site selective A-to-I editing detects novel sites in neuron specific Hu proteins2010In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 11, no 6Article in journal (Refereed)
    Abstract [en]

    Background

    Several bioinformatic approaches have previously been used to find novel sites of ADAR mediated A-to-I RNA editing in human. These studies have discovered thousands of genes that are hyper-edited in their non-coding intronic regions, especially in alu retrotransposable elements, but very few substrates that are site-selectively edited in coding regions. Known RNA edited substrates suggest, however, that site selective A-to-I editing is particularly important for normal brain development in mammals.

    Results

    We have compiled a screen that enables the identification of new sites of site-selective editing, primarily in coding sequences. To avoid hyper-edited repeat regions, we applied our screen to the alu-free mouse genome. Focusing on the mouse also facilitated better experimental verification. To identify candidate sites of RNA editing, we first performed an explorative screen based on RNA structure and genomic sequence conservation. We further evaluated the results of the explorative screen by determining which transcripts were enriched for A-G mismatches between the genomic template and the expressed sequence since the editing product, inosine (I), is read as guanosine (G) by the translational machinery. For expressed sequences, we only considered coding regions to focus entirely on re-coding events. Lastly, we refined the results from the explorative screen using a novel scoring scheme based on characteristics for known A-to-I edited sites. The extent of editing in the final candidate genes was verified using total RNA from mouse brain and 454 sequencing.

    Conclusions

    Using this method, we identified and confirmed efficient editing at one site in the Gabra3 gene. Editing was also verified at several other novel sites within candidates predicted to be edited. Five of these sites are situated in genes coding for the neuron-specific RNA binding proteins HuB and HuD.

  • 4.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Pekkari, Isabella
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain architecture conservation in orthologs2011In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 12, p. 326-Article in journal (Refereed)
    Abstract [en]

    Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.

    Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.

    Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

  • 5.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Noble, William Stafford
    Käll, Lukas
    A cross-validation scheme for machine learning algorithms in shotgun proteomics2012In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 13, p. S3-Article in journal (Refereed)
    Abstract [en]

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • 6.
    Illergård, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Callegari, Simone
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    MPRAP: An accessibility predictor for a-helical transmem-brane proteins that performs well inside and outside the membrane2010In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 11, p. 333-Article in journal (Refereed)
    Abstract [en]

    Background: In water-soluble proteins it is energetically favorable to bury hydrophobic residues and to expose polar and charged residues. In contrast to water soluble proteins, transmembrane proteins face three distinct environments; a hydrophobic lipid environment inside the membrane, a hydrophilic water environment outside the membrane and an interface region rich in phospholipid head-groups. Therefore, it is energetically favorable for transmembrane proteins to expose different types of residues in the different regions. Results: Investigations of a set of structurally determined transmembrane proteins showed that the composition of solvent exposed residues differs significantly inside and outside the membrane. In contrast, residues buried within the interior of a protein show a much smaller difference. However, in all regions exposed residues are less conserved than buried residues. Further, we found that current state-of-the-art predictors for surface area are optimized for one of the regions and perform badly in the other regions. To circumvent this limitation we developed a new predictor, MPRAP, that performs well in all regions. In addition, MPRAP performs better on complete membrane proteins than a combination of specialized predictors and acceptably on water-soluble proteins. A web-server of MPRAP is available at http://mprap.cbr.su.se/ Conclusion: By including complete a-helical transmembrane proteins in the training MPRAP is able to predict surface accessibility accurately both inside and outside the membrane. This predictor can aid in the prediction of 3D-structure, and in the identification of erroneous protein structures.

    Download full text (pdf)
    Fulltext
  • 7. Khan, Mehmood Alam
    et al.
    Elias, Isaac
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nylander, Kristina
    Guimera, Roman Valls
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Schobesberger, Richard
    Schmitzberger, Peter
    Lagergren, Jens
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Fastphylo: Fast tools for phylogenetics2013In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, p. 334-Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Distance methods are ubiquitous tools in phylogenetics.Their primary purpose may be to reconstructevolutionary history, but they are also used as components in bioinformatic pipelines. However, poorcomputational efficiency has been a constraint on the applicability of distance methods on very largeproblem instances.

    RESULTS: We present fastphylo, a software package containing implementations of efficient algorithms for twocommon problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing aphylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methodsand report the results in terms of speed and memory efficiency.

    CONCLUSIONS: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture,fastphylo is a flexible tool for many phylogenetic studies.

    Download full text (pdf)
    fulltext
  • 8. Khan, Mehmood Alam
    et al.
    Mahmudi, Owais
    Ullah, Ikram
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Centre, Sweden.
    Lagergren, Jens
    Probabilistic inference of lateral gene transfer events2016In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 17, no Suppl 14, article id 431Article in journal (Refereed)
    Abstract [en]

    Background: Lateral gene transfer (LGT) is an evolutionary process that has an important role in biology. It challenges the traditional binary tree-like evolution of species and is attracting increasing attention of the molecular biologists due to its involvement in antibiotic resistance. A number of attempts have been made to model LGT in the presence of gene duplication and loss, but reliably placing LGT events in the species tree has remained a challenge.

    Results: In this paper, we propose probabilistic methods that samples reconciliations of the gene tree with a dated species tree and computes maximum a posteriori probabilities. The MCMC-based method uses the probabilistic model DLTRS, that integrates LGT, gene duplication, gene loss, and sequence evolution under a relaxed molecular clock for substitution rates. We can estimate posterior distributions on gene trees and, in contrast to previous work, the actual placement of potential LGT, which can be used to, e.g., identify highways of LGT.

    Conclusions: Based on a simulation study, we conclude that the method is able to infer the true LGT events on gene tree and reconcile it to the correct edges on the species tree in most cases. Applied to two biological datasets, containing gene families from Cyanobacteria and Molicutes, we find potential LGTs highways that corroborate other studies as well as previously undetected examples.

  • 9.
    Klammer, Martin
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Messina, David N.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    MetaTM - a consensus method for transmembrane protein topology prediction2009In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 10, p. 314-Article in journal (Refereed)
    Abstract [en]

    Transmembrane (TM) proteins are proteins that span a biological membrane one or more times. As their 3-D structures are hard to determine, experiments focus on identifying their topology (i. e. which parts of the amino acid sequence are buried in the membrane and which are located on either side of the membrane), but only a few topologies are known. Consequently, various computational TM topology predictors have been developed, but their accuracies are far from perfect. The prediction quality can be improved by applying a consensus approach, which combines results of several predictors to yield a more reliable result. RESULTS: A novel TM consensus method, named MetaTM, is proposed in this work. MetaTM is based on support vector machine models and combines the results of six TM topology predictors and two signal peptide predictors. On a large data set comprising 1460 sequences of TM proteins with known topologies and 2362 globular protein sequences it correctly predicts 86.7% of all topologies. CONCLUSION: Combining several TM predictors in a consensus prediction framework improves overall accuracy compared to any of the individual methods. Our proposed SVM-based system also has higher accuracy than a previous consensus predictor. MetaTM is made available both as downloadable source code and as DAS server at http://MetaTM.sbc.su.se.

    Download full text (pdf)
    MetaTM
  • 10.
    Kutschera, Verena E.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kierczak, Marcin
    van der Valk, Tom
    von Seth, Johanna
    Stockholm University, Faculty of Science, Department of Zoology. Centre for Palaeogenetics, Sweden; Swedish Museum of Natural History, Sweden.
    Dussex, Nicolas
    Stockholm University, Faculty of Science, Department of Zoology. Centre for Palaeogenetics, Sweden; Swedish Museum of Natural History, Sweden.
    Lord, Edana
    Stockholm University, Faculty of Science, Department of Zoology. Centre for Palaeogenetics, Sweden; Swedish Museum of Natural History, Sweden.
    Dehasque, Marianne
    Stockholm University, Faculty of Science, Department of Zoology. Centre for Palaeogenetics, Sweden; Swedish Museum of Natural History, Sweden.
    Stanton, David W. G.
    Emami Khoonsari, Payam
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nystedt, Björn
    Dalén, Love
    Stockholm University, Faculty of Science, Department of Zoology. Centre for Palaeogenetics, Sweden; Swedish Museum of Natural History, Sweden.
    Díez-del-Molino, David
    Stockholm University, Faculty of Science, Department of Zoology. Centre for Palaeogenetics, Sweden; Swedish Museum of Natural History, Sweden.
    GenErode: a bioinformatics pipeline to investigate genome erosion in endangered and extinct species2022In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 23, no 1, article id 228Article in journal (Refereed)
    Abstract [en]

    Background: Many wild species have suffered drastic population size declines over the past centuries, which have led to 'genomic erosion' processes characterized by reduced genetic diversity, increased inbreeding, and accumulation of harmful mutations. Yet, genomic erosion estimates of modern-day populations often lack concordance with dwindling population sizes and conservation status of threatened species. One way to directly quantify the genomic consequences of population declines is to compare genome-wide data from pre-decline museum samples and modern samples. However, doing so requires computational data processing and analysis tools specifically adapted to comparative analyses of degraded, ancient or historical, DNA data with modern DNA data as well as personnel trained to perform such analyses.

    Results: Here, we present a highly flexible, scalable, and modular pipeline to compare patterns of genomic erosion using samples from disparate time periods. The GenErode pipeline uses state-of-the-art bioinformatics tools to simultaneously process whole-genome re-sequencing data from ancient/historical and modern samples, and to produce comparable estimates of several genomic erosion indices. No programming knowledge is required to run the pipeline and all bioinformatic steps are well-documented, making the pipeline accessible to users with different backgrounds. GenErode is written in Snakemake and Python3 and uses Conda and Singularity containers to achieve reproducibility on high-performance compute clusters. The source code is freely available on GitHub (https://github.com/NBISweden/GenErode).

    Conclusions: GenErode is a user-friendly and reproducible pipeline that enables the standardization of genomic erosion indices from temporally sampled whole genome re-sequencing data.

  • 11. Lassmann, Timo
    et al.
    Sonnhammer, Erik Ll
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Automatic extraction of reliable regions from multiple sequence alignments.2007In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 8 Suppl 5, p. S9-Article in journal (Refereed)
  • 12. Mahmudi, Owais
    et al.
    Sjöstrand, Joel
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sennblad, Bengt
    Lagergren, Jens
    Genome-wide probabilistic reconciliation analysis across vertebrates2013In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, no Suppl 15, p. S10-Article in journal (Refereed)
    Abstract [en]

    Gene duplication is considered to be a major driving force in evolution that enables the genome of a species to acquire new functions. A reconciliation - a mapping of gene tree vertices to the edges or vertices of a species tree - explains where gene duplications have occurred on the species tree. In this study, we sample reconciliations from a posterior over reconciliations, gene trees, edge lengths and other parameters, given a species tree and gene sequences. We employ a Bayesian analysis tool, based on the probabilistic model DLRS that integrates gene duplication, gene loss and sequence evolution under a relaxed molecular clock for substitution rates, to obtain this posterior.

    By applying these methods, we perform a genome-wide analysis of a nine species dataset, OPTIC, and conclude that for many gene families, the most parsimonious reconciliation (MPR) - a reconciliation that minimizes the number of duplications - is far from the correct explanation of the evolutionary history. For the given dataset, we observe that approximately 19% of the sampled reconciliations are different from MPR. This is in clear contrast with previous estimates, based on simpler models and less realistic assumptions, according to which 98% of the reconciliations can be expected to be identical to MPR. We also generate heatmaps showing where in the species trees duplications have been most frequent during the evolution of these species.

  • 13.
    Marco Salas, Sergio
    et al.
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Gyllborg, Daniel
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Mattsson Langseth, Christoffer
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Nilsson, Mats
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Matisse: a MATLAB-based analysis toolbox for in situ sequencing expression maps2021In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 22, no 1, article id 391Article in journal (Refereed)
    Abstract [en]

    Background: A range of spatially resolved transcriptomic methods has recently emerged as a way to spatially characterize the molecular and cellular diversity of a tissue. As a consequence, an increasing number of computational techniques are developed to facilitate data analysis. There is also a need for versatile user friendly tools that can be used for a de novo exploration of datasets.

    Results: Here we present MATLAB-based Analysis toolbox for in situ sequencing (ISS) expression maps (Matisse). We demonstrate Matisse by characterizing the 2-dimensional spatial expression of 119 genes profiled in a mouse coronal section, exploring different levels of complexity. Additionally, in a comprehensive analysis, we further analyzed expression maps from a second technology, osmFISH, targeting a similar mouse brain region.

    Conclusion: Matisse proves to be a valuable tool for initial exploration of in situ sequencing datasets. The wide set of tools integrated allows for simple analysis, using the position of individual reads, up to more complex clustering and dimensional reduction approaches, taking cellular content into account. The toolbox can be used to analyze one or several samples at a time, even from different spatial technologies, and it includes different segmentation approaches that can be useful in the analysis of spatially resolved transcriptomic datasets.

  • 14.
    Merid, Simon Kebede
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Goranskaya, Daria
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Distinguishing between driver and passenger mutations in individual cancer genomes by network enrichment analysis2014In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 15, p. 308-Article in journal (Refereed)
    Abstract [en]

    Background: In somatic cancer genomes, delineating genuine driver mutations against a background of multiple passenger events is a challenging task. The difficulty of determining function from sequence data and the low frequency of mutations are increasingly hindering the search for novel, less common cancer drivers. The accumulation of extensive amounts of data on somatic point and copy number alterations necessitates the development of systematic methods for driver mutation analysis. Results: We introduce a framework for detecting driver mutations via functional network analysis, which is applied to individual genomes and does not require pooling multiple samples. It probabilistically evaluates 1) functional network links between different mutations in the same genome and 2) links between individual mutations and known cancer pathways. In addition, it can employ correlations of mutation patterns in pairs of genes. The method was used to analyze genomic alterations in two TCGA datasets, one for glioblastoma multiforme and another for ovarian carcinoma, which were generated using different approaches to mutation profiling. The proportions of drivers among the reported de novo point mutations in these cancers were estimated to be 57.8% and 16.8%, respectively. The both sets also included extended chromosomal regions with synchronous duplications or losses of multiple genes. We identified putative copy number driver events within many such segments. Finally, we summarized seemingly disparate mutations and discovered a functional network of collagen modifications in the glioblastoma. In order to select the most efficient network for use with this method, we used a novel, ROC curve-based procedure for benchmarking different network versions by their ability to recover pathway membership. Conclusions: The results of our network-based procedure were in good agreement with published gold standard sets of cancer genes and were shown to complement and expand frequency-based driver analyses. On the other hand, three sequence-based methods applied to the same data yielded poor agreement with each other and with our results. We review the difference in driver proportions discovered by different sequencing approaches and discuss the functional roles of novel driver mutations. The software used in this work and the global network of functional couplings are publicly available at http://research.scilifelab.se/andrej_alexeyenko/downloads.html.

  • 15. Moeller, Steffen
    et al.
    Afgan, Enis
    Banck, Michael
    Bonnal, Raoul J. P.
    Booth, Timothy
    Chilton, John
    Cock, Peter J. A.
    Gumbel, Markus
    Harris, Nomi
    Holland, Richard
    Kalas, Matus
    Kajan, Laszlo
    Kibukawa, Eri
    Powel, David R.
    Prins, Pjotr
    Quinn, Jacqueline
    Sallou, Olivier
    Strozzi, Francesco
    Seemann, Torsten
    Sloggett, Clare
    Soiland-Reyes, Stian
    Spooner, William
    Steinbiss, Sascha
    Tille, Andreas
    Travis, Anthony J.
    Valls Guimera, Roman
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Katayama, Toshiaki
    Chapman, Brad A.
    Community-driven development for computational biology at Sprints, Hackathons and Codefests2014In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 15, p. S7-Article in journal (Refereed)
    Abstract [en]

    Background: Computational biology comprises a wide range of technologies and approaches. Multiple technologies can be combined to create more powerful workflows if the individuals contributing the data or providing tools for its interpretation can find mutual understanding and consensus. Much conversation and joint investigation are required in order to identify and implement the best approaches. Traditionally, scientific conferences feature talks presenting novel technologies or insights, followed up by informal discussions during coffee breaks. In multi-institution collaborations, in order to reach agreement on implementation details or to transfer deeper insights in a technology and practical skills, a representative of one group typically visits the other. However, this does not scale well when the number of technologies or research groups is large. Conferences have responded to this issue by introducing Birds-of-a-Feather (BoF) sessions, which offer an opportunity for individuals with common interests to intensify their interaction. However, parallel BoF sessions often make it hard for participants to join multiple BoFs and find common ground between the different technologies, and BoFs are generally too short to allow time for participants to program together. Results: This report summarises our experience with computational biology Codefests, Hackathons and Sprints, which are interactive developer meetings. They are structured to reduce the limitations of traditional scientific meetings described above by strengthening the interaction among peers and letting the participants determine the schedule and topics. These meetings are commonly run as loosely scheduled unconferences (self-organized identification of participants and topics for meetings) over at least two days, with early introductory talks to welcome and organize contributors, followed by intensive collaborative coding sessions. We summarise some prominent achievements of those meetings and describe differences in how these are organised, how their audience is addressed, and their outreach to their respective communities. Conclusions: Hackathons, Codefests and Sprints share a stimulating atmosphere that encourages participants to jointly brainstorm and tackle problems of shared interest in a self-driven proactive environment, as well as providing an opportunity for new participants to get involved in collaborative projects.

  • 16. Nellåker, Christoffer
    et al.
    Uhrzander, Fredrik
    Tyrcha, Joanna
    Stockholm University, Faculty of Science, Department of Mathematics. Matematisk statistik.
    Karlsson, Håkan
    Mixture models for analysis of melting temperature data2008In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 9:370Article in journal (Refereed)
    Abstract [en]

    Background

    In addition to their use in detecting undesired real-time PCR products, melting temperatures are useful for detecting variations in the desired target sequences. Methodological improvements in recent years allow the generation of high-resolution melting-temperature (Tm) data. However, there is currently no convention on how to statistically analyze such high-resolution Tm data.

    Results

    Mixture model analysis was applied to Tm data. Models were selected based on Akaike's information criterion. Mixture model analysis correctly identified categories in Tm data obtained for known plasmid targets. Using simulated data, we investigated the number of observations required for model construction. The precision of the reported mixing proportions from data fitted to a preconstructed model was also evaluated.

    Conclusion

    Mixture model analysis of Tm data allows the minimum number of different sequences in a set of amplicons and their relative frequencies to be determined. This approach allows Tm data to be analyzed, classified, and compared in an unbiased manner.

  • 17.
    Ohlson, Tomas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    ProfNet, a method to derive profile-profile alignment scoring functions that improves the alignments of distantly related proteins2005In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 6, no 253, p. 1-9Article in journal (Refereed)
  • 18.
    Persson, Emma
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kaduk, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Forslund, Sofia K.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Domainoid: domain-oriented orthology inference2019In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 20, no 1, article id 523Article in journal (Refereed)
    Abstract [en]

    Background: Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains.

    Results: This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark.

    Conclusions: Our results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches.

  • 19.
    Prager, Maria
    et al.
    Stockholm University, Faculty of Science, Department of Ecology, Environment and Plant Sciences. Stockholm University, Science for Life Laboratory (SciLifeLab). Karolinska Institutet, Sweden.
    Lundin, Daniel
    Ronquist, Fredrik
    Andersson, Anders F.
    ASV portal: an interface to DNA-based biodiversity data in the Living Atlas2023In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 24, no 1, article id 6Article in journal (Refereed)
    Abstract [en]

    Background: The Living Atlas is an open source platform used to collect, visualise and analyse biodiversity data from multiple sources, and serves as the national biodiversity data hub in many countries. Although powerful, the Living Atlas has had limited functionality for species occurrence data derived from DNA sequences. As a step toward integrating this fast-growing data source into the platform, we developed the Amplicon Sequence Variant (ASV) portal: a web interface to sequence-based biodiversity observations in the Living Atlas.

    Results: The ASV portal allows data providers to submit denoised metabarcoding output to the Living Atlas platform via an intermediary ASV database. It also enables users to search for existing ASVs and associated Living Atlas records using the Basic Local Alignment Search Tool, or via filters on taxonomy and sequencing details. The ASV portal is a Python-Flask/jQuery web interface, implemented as a multi-container docker service, and is an integral part of the Swedish Biodiversity Data Infrastructure.

    Conclusion: The ASV portal is a web interface that effectively integrates biodiversity data derived from DNA sequences into the Living Atlas platform.

  • 20. Sahlin, Kristoffer
    et al.
    Vezzi, Francesco
    Nystedt, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Lundeberg, Joakim
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    BESST - Efficient scaffolding of large fragmented assemblies2014In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 15, article id 281Article in journal (Refereed)
    Abstract [en]

    Background

    The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.

    We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software’s general performance. 

    Results

    We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide.

    Conclusion

    We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding. 

    Download full text (pdf)
    fulltext
  • 21.
    Sjöstrand, Joel
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Lagergren, Jens
    Sennblad, Bengt
    GenPhyloData: realistic simulation of gene family evolution2013In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, article id 209Article in journal (Refereed)
    Abstract [en]

    Background: PrIME-GenPhyloData is a suite of tools for creating realistic simulated phylogenetic trees, in particular for families of homologous genes. It supports generation of trees based on a birth-death process and-perhaps more interestingly-also supports generation of gene family trees guided by a known (synthetic or biological) species tree while accounting for events such as gene duplication, gene loss, and lateral gene transfer (LGT). The suite also supports a wide range of branch rate models enabling relaxation of the molecular clock. Result: Simulated data created with PrIME-GenPhyloData can be used for benchmarking phylogenetic approaches, or for characterizing models or model parameters with respect to biological data. Conclusion: The concept of tree-in-tree evolution can also be used to model, for instance, biogeography or host-parasite co-evolution.

  • 22. Vicedomini, Riccardo
    et al.
    Vezzi, Francesco
    Scalabrin, Simone
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Policriti, Alberto
    GAM-NGS: genomic assemblies merger for next generation sequencing2013In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, no Suppl.7, p. S6-Article in journal (Refereed)
    Abstract [en]

    Background: In recent years more than 20 assemblers have been proposed to tackle the hard task of assembling NGS data. A common heuristic when assembling a genome is to use several assemblers and then select the best assembly according to some criteria. However, recent results clearly show that some assemblers lead to better statistics than others on specific regions but are outperformed on other regions or on different evaluation measures. To limit these problems we developed GAM-NGS (Genomic Assemblies Merger for Next Generation Sequencing), whose primary goal is to merge two or more assemblies in order to enhance contiguity and correctness of both. GAM-NGS does not rely on global alignment: regions of the two assemblies representing the same genomic locus (called blocks) are identified through reads' alignments and stored in a weighted graph. The merging phase is carried out with the help of this weighted graph that allows an optimal resolution of local problematic regions. Results: GAM-NGS has been tested on six different datasets and compared to other assembly reconciliation tools. The availability of a reference sequence for three of them allowed us to show how GAM-NGS is a tool able to output an improved reliable set of sequences. GAM-NGS is also a very efficient tool able to merge assemblies using substantially less computational resources than comparable tools. In order to achieve such goals, GAM-NGS avoids global alignment between contigs, making its strategy unique among other assembly reconciliation tools. Conclusions: The difficulty to obtain correct and reliable assemblies using a single assembler is forcing the introduction of new algorithms able to enhance de novo assemblies. GAM-NGS is a tool able to merge two or more assemblies in order to improve contiguity and correctness. It can be used on all NGS-based assembly projects and it shows its full potential with multi-library Illumina-based projects. With more than 20 available assemblers it is hard to select the best tool. In this context we propose a tool that improves assemblies (and, as a by-product, perhaps even assemblers) by merging them and selecting the generating that is most likely to be correct.

  • 23.
    Wängberg, Tobias
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics.
    Tyrcha, Joanna
    Stockholm University, Faculty of Science, Department of Mathematics.
    Li, Chun-Biu
    Stockholm University, Faculty of Science, Department of Mathematics.
    Shape-aware stochastic neighbor embedding for robust data visualisations2022In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 23, no 1, article id 477Article in journal (Refereed)
    Abstract [en]

    Background: The t-distributed Stochastic Neighbor Embedding (t-SNE) algorithm has emerged as one of the leading methods for visualising high-dimensional (HD) data in a wide variety of fields, especially for revealing cluster structure in HD single-cell transcriptomics data. However, t-SNE often fails to correctly represent hierarchical relationships between clusters and creates spurious patterns in the embedding. In this work we generalised t-SNE using shape-aware graph distances to mitigate some of the limitations of the t-SNE. Although many methods have been recently proposed to circumvent the shortcomings of t-SNE, notably Uniform manifold approximation (UMAP) and Potential of heat diffusion for affinity-based transition embedding (PHATE), we see a clear advantage of the proposed graph-based method.

    Results: The superior performance of the proposed method is first demonstrated on simulated data, where a significant improvement compared to t-SNE, UMAP and PHATE, based on quantitative validation indices, is observed when visualising imbalanced, nonlinear, continuous and hierarchically structured data. Thereafter the ability of the proposed method compared to the competing methods to create faithfully low-dimensional embeddings is shown on two real-world data sets, the single-cell transcriptomics data and the MNIST image data. In addition, the only hyper-parameter of the method can be automatically chosen in a data-driven way, which is consistently optimal across all test cases in this study.

    Conclusions: In this work we show that the proposed shape-aware stochastic neighbor embedding method creates low-dimensional visualisations that robustly and accurately reveal key structures of high-dimensional data.

1 - 23 of 23
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf