Change search
Refine search result
1 - 41 of 41
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Björkholm, Patrik
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Daniluk, Pawel
    Kryshtafovych, Andriy
    Fidelis, Krzysztof
    Andersson, Robin
    Hvidsten, Torgeir R.
    Using multi-data hidden Markov models trained on local neighborhoods of protein structure to predict residue-residue contacts2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 10, p. 1264-1270Article in journal (Refereed)
    Abstract [en]

    Motivation: Correct prediction of residue-residue contacts in proteins that lack good templates with known structure would take ab initio protein structure prediction a large step forward. The lack of correct contacts, and in particular long-range contacts, is considered the main reason why these methods often fail. Results: We propose a novel hidden Markov model (HMM)based method for predicting residue-residue contacts from protein sequences using as training data homologous sequences, predicted secondary structure and a library of local neighborhoods (local descriptors of protein structure). The library consists of recurring structural entities incorporating short-, medium- and long-range interactions and is general enough to reassemble the cores of nearly all proteins in the PDB. The method is tested on an external test set of 606 domains with no significant sequence similarity to the training set as well as 151 domains with SCOP folds not present in the training set. Considering the top 0.2 . L predictions (L = sequence length), our HMMs obtained an accuracy of 22.8% for long-range interactions in new fold targets, and an average accuracy of 28.6% for long-, medium- and short- range contacts. This is a significant performance increase over currently available methods when comparing against results published in the literature.

  • 2. Brameier, Markus
    et al.
    Krings, Andrea
    Stockholm University.
    MacCallum, Robert M.
    NucPred - Predicting nuclear localization of proteins2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 9, p. 1159-1160Article in journal (Refereed)
    Abstract [en]

    NucPred analyzes patterns in eukaryotic protein sequences and predicts if a protein spends at least some time in the nucleus or no time at all. Subcellular location of proteins represents functional information, which is important for understanding protein interactions, for the diagnosis of human diseases and for drug discovery. NucPred is a novel web tool based on regular expression matching and multiple program classifiers induced by genetic programming. A likelihood score is derived from the programs for each input sequence and each residue position. Different forms of visualization are provided to assist the detection of nuclear localization signals (NLSs). The NucPred server also provides access to additional sources of biological information (real and predicted) for a better validation and interpretation of results.

  • 3. Conley, Christopher J.
    et al.
    Smith, Rob
    Torgrip, Ralf
    Stockholm University, Faculty of Science, Department of Analytical Chemistry.
    Taylor, Ryan M.
    Tautenhahn, Ralf
    Prince, John T.
    Massifquant: open-source Kalman filter-based XC-MS isotope trace feature detection2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, p. 2636-2643Article in journal (Refereed)
    Abstract [en]

    Motivation: Isotope trace (IT) detection is a fundamental step for liquid or gas chromatography mass spectrometry (XC-MS) data analysis that faces a multitude of technical challenges on complex samples. The Kalman filter (KF) application to IT detection addresses some of these challenges; it discriminates closely eluting ITs in the m/z dimension, flexibly handles heteroscedastic m/z variances and does not bin the m/z axis. Yet, the behavior of this KF application has not been fully characterized, as no cost-free open-source implementation exists and incomplete evaluation standards for IT detection persist.

    Results: Massifquant is an open-source solution for KF IT detection that has been subjected to novel and rigorous methods of performance evaluation. The presented evaluation with accompanying annotations and optimization guide sets a new standard for comparative IT detection. Compared with centWave, matchedFilter and MZMine2-alternative IT detection engines-Massifquant detected more true ITs in a real LC-MS complex sample, especially low-intensity ITs. It also offers competitive specificity and equally effective quantitation accuracy.

  • 4. Dimou, Niki L.
    et al.
    Tsirigos, Konstantinos D.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Bagos, Pantelis G.
    GWAR: robust analysis and meta-analysis of genome-wide association studies2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 10, p. 1521-1527Article in journal (Refereed)
    Abstract [en]

    Motivation: In the context of genome-wide association studies (GWAS), there is a variety of statistical techniques in order to conduct the analysis, but, in most cases, the underlying genetic model is usually unknown. Under these circumstances, the classical Cochran-Armitage trend test (CATT) is suboptimal. Robust procedures that maximize the power and preserve the nominal type I error rate are preferable. Moreover, performing a meta-analysis using robust procedures is of great interest and has never been addressed in the past. The primary goal of this work is to implement several robust methods for analysis and meta-analysis in the statistical package Stata and subsequently to make the software available to the scientific community. Results: The CATT under a recessive, additive and dominant model of inheritance as well as robust methods based on the Maximum Efficiency Robust Test statistic, the MAX statistic and the MIN2 were implemented in Stata. Concerning MAX and MIN2, we calculated their asymptotic null distributions relying on numerical integration resulting in a great gain in computational time without losing accuracy. All the aforementioned approaches were employed in a fixed or a random effects meta-analysis setting using summary data with weights equal to the reciprocal of the combined cases and controls. Overall, this is the first complete effort to implement procedures for analysis and meta-analysis in GWAS using Stata.

  • 5.
    Ewels, Philip
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Magnusson, Måns
    Lundin, Sverker
    Käller, Max
    MultiQC: summarize analysis results for multiple tools and samples in a single report2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 19, p. 3047-3048Article in journal (Refereed)
    Abstract [en]

    Motivation: Fast and accurate quality control is essential for studies involving next-generation sequencing data. Whilst numerous tools exist to quantify QC metrics, there is no common approach to flexibly integrate these across tools and large sample sets. Assessing analysis results across an entire project can be time consuming and error prone; batch effects and outlier samples can easily be missed in the early stages of analysis.

    Results: We present MultiQC, a tool to create a single report visualising output from multiple tools across many samples, enabling global trends and biases to be quickly identified. MultiQC can plot data from many common bioinformatics tools and is built to allow easy extension and customization.

  • 6. Forslund, Kristoffer
    et al.
    Pereira, Cecile
    Capella-Gutierrez, Salvador
    Sousa da Silva, Alan
    Altenhoff, Adrian
    Huerta-Cepas, Jaime
    Muffato, Matthieu
    Patricio, Mateus
    Vandepoele, Klaas
    Ebersberger, Ingo
    Blake, Judith
    Fernandez Breis, Jesualdo Tomas
    Boeckmann, Brigitte
    Gabaldon, Toni
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Dessimoz, Christophe
    Lewis, Suzanna
    Gearing up to handle the mosaic nature of life in the quest for orthologs2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 2, p. 323-329Article in journal (Refereed)
    Abstract [en]

    The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

  • 7.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Benchmarking homology detection procedures with low complexity filters2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 19, p. 2500-2505Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.

    RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.

    CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.

    AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

  • 8.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting protein function from domain content2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 15, p. 1681-1687Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.

    RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.

    AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar

  • 9. Garg, Shilpa
    et al.
    Martin, Marcel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Marschall, Tobias
    Read-based phasing of related individuals2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 12, p. 234-242Article in journal (Refereed)
    Abstract [en]

    Motivation: Read-based phasing deduces the haplotypes of an individual from sequencing reads that cover multiple variants, while genetic phasing takes only genotypes as input and applies the rules of Mendelian inheritance to infer haplotypes within a pedigree of individuals. Combining both into an approach that uses these two independent sources of information-reads and pedigree-has the potential to deliver results better than each individually. Results: We provide a theoretical framework combining read-based phasing with genetic haplotyping, and describe a fixed-parameter algorithm and its implementation for finding an optimal solution. We show that leveraging reads of related individuals jointly in this way yields more phased variants and at a higher accuracy than when phased separately, both in simulated and real data. Coverages as low as 2 x for each member of a trio yield haplotypes that are as accurate as when analyzed separately at 15 x coverage per individual.

  • 10.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden; Swedish eScience Research Center, Sweden.
    MaxLink: network-based prioritization of genes tightly linked to a disease seed set2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, p. 2689-2690Article in journal (Refereed)
    Abstract [en]

    A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.

  • 11.
    Haider, Christian
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Kavic, Marina
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    TreeDom: a graphical web tool for analysing domain architecture evolution2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 15, p. 2384-2385Article in journal (Refereed)
    Abstract [en]

    We present TreeDom, a web tool for graphically analysing the evolutionary history of domains in multi-domain proteins. Individual domains on the same protein chain may have distinct evolutionary histories, which is important to grasp in order to understand protein function. For instance, it may be important to know whether a domain was duplicated recently or long ago, to know the origin of inserted domains, or to know the pattern of domain loss within a protein family. TreeDom uses the Pfam database as the source of domain annotations, and displays these on a sequence tree. An advantage of TreeDom is that the user can limit the analysis to N sequences that are most similar to a query, or provide a list of sequence IDs to include. Using the Pfam alignment of the selected sequences, a tree is built and displayed together with the domain architecture of each sequence.

  • 12.
    Hayat, Sikander
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Ranking models of transmembrane beta-barrel proteins using Z-coordinate predictions2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 12, p. i90-I96Article in journal (Refereed)
    Abstract [en]

    Motivation: Transmembrane beta-barrels exist in the outer membrane of gram-negative bacteria as well as in chloroplast and mitochondria. They are often involved in transport processes and are promising antimicrobial drug targets. Structures of only a few beta-barrel protein families are known. Therefore, a method that could automatically generate such models would be valuable. The symmetrical arrangement of the barrels suggests that an approach based on idealized geometries may be successful. Results: Here, we present tobmodel; a method for generating 3D models of beta-barrel transmembrane proteins. First, alternative topologies are obtained from the BOCTOPUS topology predictor. Thereafter, several 3D models are constructed by using different angles of the beta-sheets. Finally, the best model is selected based on agreement with a novel predictor, ZPRED3, which predicts the distance from the center of the membrane for each residue, i.e. the Z-coordinate. The Z-coordinate prediction has an average error of 1.61 A. Tobmodel predicts the correct topology for 75% of the proteins in the dataset which is a slight improvement over BOCTOPUS alone. More importantly, however, tobmodel provides a C alpha template with an average RMSD of 7.24 A from the native structure.

  • 13. Hayat, Sikander
    et al.
    Peters, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tsirigos, Konstantinos D.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Inclusion of dyad-repeat pattern improves topology prediction of transmembrane beta-barrel proteins2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 10, p. 1571-1573Article in journal (Refereed)
    Abstract [en]

    Accurate topology prediction of transmembrane beta-barrels is still an open question. Here, we present BOCTOPUS2, an improved topology prediction method for transmembrane beta-barrels that can also identify the barrel domain, predict the topology and identify the orientation of residues in transmembrane beta-strands. The major novelty of BOCTOPUS2 is the use of the dyad-repeat pattern of lipid and pore facing residues observed in transmembrane beta-barrels. In a cross-validation test on a benchmark set of 42 proteins, BOCTOPUS2 predicts the correct topology in 69% of the proteins, an improvement of more than 10% over the best earlier method (BOCTOPUS) and in addition, it produces significantly fewer erroneous predictions on non-transmembrane beta-barrel proteins.

  • 14.
    Hennerdal, Aron
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Rapid membrane protein topology prediction2011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 9, p. 1322-1323Article in journal (Refereed)
    Abstract [en]

    State-of-the-art methods for topology of α-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ~6% worse than the best method that is utilizing multiple sequence alignments. AVAILABILITY AND IMPLEMENTATION: TOPCONS-single is available as a web server from http://single.topcons.net/ and is also included for local installation from the web site. In addition, consensus-based topology predictions for the entire international protein index (IPI) is available from the web server and will be updated at regular intervals.

  • 15. Hollich, Volker
    et al.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    PfamAlyzer: domain-centric homology search2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 24, p. 3382-3383Article in journal (Refereed)
    Abstract [en]

    PfamAlyzer is a Java applet that enables exploration of Pfam domain architectures using a user-friendly graphical interface. It can search the UniProt protein database for a domain pattern. Domain patterns similar to the query are presented graphically by PfamAlyzer either in a ranked list or pinned to the tree of life. Such domain-centric homology search can assist identification of distant homologs with shared domain architecture.

  • 16.
    Höhna, Sebastian
    Stockholm University, Faculty of Science, Department of Mathematics.
    Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 11, p. 1367-1374Article in journal (Refereed)
    Abstract [en]

    Motivation: Diversification rates and patterns may be inferred from reconstructed phylogenies. Both the time-dependent and the diversity-dependent birthdeath process can produce the same observed patterns of diversity over time. To develop and test new models describing the macro-evolutionary process of diversification, generic and fast algorithms to simulate under these models are necessary. Simulations are not only important for testing and developing models but play an influential role in the assessment of model fit.

    Results: In the present article, I consider as the model a global time-dependent birthdeath process where each species has the same rates but rates may vary over time. For this model, I derive the likelihood of the speciation times from a reconstructed phylogenetic tree and show that each speciation event is independent and identically distributed. This fact can be used to simulate efficiently reconstructed phylogenetic trees when conditioning on the number of species, the time of the process or both. I show the usability of the simulation by approximating the posterior predictive distribution of a birthdeath process with decreasing diversification rates applied on a published bird phylogeny (family Cettiidae).

    Availability: The methods described in this manuscript are implemented in the R package TESS, available from the repository CRAN (http://cran.r-project.org/web/packages/TESS/).

  • 17.
    Höhna, Sebastian
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics. University of California, USA.
    May, Michael R.
    Moore, Brian R.
    TESS: an R package for efficiently simulating phylogenetic trees and performing Bayesian inference of lineage diversification rates2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 5, p. 789-791Article in journal (Refereed)
    Abstract [en]

    Many fundamental questions in evolutionary biology entail estimating rates of lineage diversification (speciation-extinction) that are modeled using birth-death branching processes. We leverage recent advances in branching-process theory to develop a flexible Bayesian framework for specifying diversification models-where rates are constant, vary continuously, or change episodically through time-and implement numerical methods to estimate parameters of these models from molecular phylogenies, even when species sampling is incomplete. We enable both statistical inference and efficient simulation under these models. We also provide robust methods for comparing the relative and absolute fit of competing branching-process models to a given tree, thereby providing rigorous tests of biological hypotheses regarding patterns and processes of lineage diversification.

  • 18.
    Kaduk, Mateusz
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Improved orthology inference with Hieranoid 22017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 8, p. 1154-1159Article in journal (Refereed)
    Abstract [en]

    Motivation: The initial step in many orthology inference methods is the computationally demanding establishment of all pairwise protein similarities across all analysed proteomes. The quadratic scaling with proteomes has become a major bottleneck. A remedy is offered by the Hieranoid algorithm which reduces the complexity to linear by hierarchically aggregating ortholog groups from InParanoid along a species tree. Results: We have further developed the Hieranoid algorithm in many ways. Major improvements have been made to the construction of multiple sequence alignments and consensus sequences. Hieranoid version 2 was evaluated with standard benchmarks that reveal a dramatic increase in the coverage/accuracy tradeoff over version 1, such that it now compares favourably with the best methods. The new parallelized cluster mode allows Hieranoid to be run on large data sets in a much shorter timespan than InParanoid, yet at similar accuracy.

  • 19.
    Larsson, Per
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Skwark, Marcin J.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Wallner, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Improved predictions by Pcons.net using multiple templates2011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 3, p. 426-427Article in journal (Refereed)
    Abstract [en]

    Multiple templates can often be used to build more accurate homology models than models built from a single template. Here we introduce PconsM, an automated protocol that uses multiple templates to build protein models. PconsM has been among the top-performing methods in the recent CASP experiments and consistently perform better than the single template models used in Pcons. net. In particular for the easier targets with many alternative templates with a high degree of sequence identity, quality is readily improved with a few percentages over the highest ranked model built on a single template. PconsM is available as an additional pipeline within the Pcons. net protein structure prediction server.

  • 20.
    Messina, David N.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    DASher: a stand-alone protein sequence client for DAS, the Distributed Annotation System2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 10, p. 1333-1334Article in journal (Refereed)
    Abstract [en]

    The rise in biological sequence data has led to a proliferation of separate, specialized databases. While there is great value in having many independent annotations, it is critical that there be a way to integrate them in one combined view. The Distributed Annotation System (DAS) was developed for that very purpose. There are currently no DAS clients that are open source, specialized for aggregating and comparing protein sequence annotation, and that can run as a self-contained application outside of a web browser. The speed, flexibility and extensibility that come with a stand-alone application motivated us to create DASher, an open-source Java DAS client. Given a UniProt sequence identifier, DASher automatically queries DAS-supporting servers worldwide for any information on that sequence and then displays the annotations in an interactive viewer for easy comparison. DASher is a fast, Java-based DAS client optimized for viewing protein sequence annotation and compliant with the latest DAS protocol specification 1.53E. AVAILABILITY: DASher is available for direct use and download at http://dasher.sbc.su.se including examples and source code under the GPLv3 licence. Java version 6 or higher is required.

  • 21.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Hayat, Sikander
    Skwark, Marcin J.
    Sander, Chris
    Marks, Debora S.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    PconsFold: improved contact predictions improve protein models2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 17, p. 1482-1488Article in journal (Refereed)
    Abstract [en]

    Motivation: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used.

    Results: In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved.

  • 22.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Hurtado, David M.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Uziela, Karolis
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Large-scale structure prediction by improved contact predictions and model quality assessment2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 14, p. 123-129Article in journal (Refereed)
    Abstract [en]

    Motivation: Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. Results: We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these 415 have not been reported before. Availability: Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net. All programs used here are freely available.

  • 23.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Skwark, Marcin J.
    Hurtado, David M.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Ekeberg, Magnus
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting accurate contacts in thousands of Pfam domain families using PconsC3In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811Article in journal (Refereed)
    Abstract [en]

    Motivation: A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most afew hundred members, i.e. are too small for such contact prediction methods.

    Results: To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts.

    Availability: PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top-ranked contacts predicted correctly.

  • 24.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Skwark, Marcin J.
    Menéndez Hurtado, David
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Ekeberg, Magnus
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Predicting accurate contacts in thousands of Pfam domain families using PconsC32017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 18, p. 2859-2866Article in journal (Refereed)
    Abstract [en]

    Motivation: A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for such contact prediction methods. Results: To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts. Availability and implementation: PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top-ranked contacts predicted correctly. Contact: arne@bioinfo.se Supplementary information: Supplementary data are available at Bioinformatics online.

  • 25.
    Peters, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tsirigos, Kostantionos D.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Improved topology prediction using the terminal hydrophobic helices rule2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 8, p. 1158-1162Article in journal (Refereed)
    Abstract [en]

    Motivation: The translocon recognizes sufficiently hydrophobic regions of a protein and inserts them into the membrane. Computational methods try to determine what hydrophobic regions are recognized by the translocon. Although these predictions are quite accurate, many methods still fail to distinguish marginally hydrophobic transmembrane (TM) helices and equally hydrophobic regions in soluble protein domains. In vivo, this problem is most likely avoided by targeting of the TM-proteins, so that non-TM proteins never see the translocon. Proteins are targeted to the translocon by an N-terminal signal peptide. The targeting is also aided by the fact that the N-terminal helix is more hydrophobic than other TM-helices. In addition, we also recently found that the C-terminal helix is more hydrophobic than central helices. This information has not been used in earlier topology predictors.

    Results: Here, we use the fact that the N- and C-terminal helices are more hydrophobic to develop a new version of the first-principle-based topology predictor, SCAMPI. The new predictor has two main advantages; first, it can be used to efficiently separate membrane and non-membrane proteins directly without the use of an extra prefilter, and second it shows improved performance for predicting the topology of membrane proteins that contain large non-membrane domains.

    Availability and implementation: The predictor, a web server and all datasets are available at http://scampi.bioinfo.se/.

  • 26. Pronk, Sander
    et al.
    Pall, Szilard
    Schulz, Roland
    Larsson, Per
    Bjelkmar, Pär
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Apostolov, Rossen
    Shirts, Michael R.
    Smith, Jeremy C.
    Kasson, Peter M.
    van der Spoel, David
    Hess, Berk
    Lindahl, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 7, p. 845-854Article in journal (Refereed)
    Abstract [en]

    Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations.

  • 27.
    Ray, Arjun
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lindahl, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Wallner, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Model quality assessment for membrane proteins2010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, no 24, p. 3067-3074Article in journal (Refereed)
    Abstract [en]

    Motivation: Learning-based model quality assessment programs have been quite successful at discriminating between high-and low-quality protein structures. Here, we show that it is possible to improve this performance significantly by restricting the learning space to a specific context, in this case membrane proteins. Since these are among the most important structures from a pharmaceutical point-of-view, it is particularly interesting to resolve local model quality for regions corresponding, e. g. to binding sites. Results: Our new ProQM method uses a support vector machine with a combination of general and membrane protein-specific features. For the transmembrane region, ProQM clearly outperforms all methods developed for generic proteins, and it does so while maintaining performance for extra-membrane domains; in this region it is only matched by ProQres. The predictor is shown to accurately predict quality both on the global and local level when applied to GPCR models, and clearly outperforms consensus-based scoring. Finally, the combination of ProQM and the Rosetta low-resolution energy function achieve a 7-fold enrichment in selection of near-native structural models, at very limited computational cost.

  • 28. Sahlin, Kristoffer
    et al.
    Chikhi, Rayan
    Arvestad, Lars
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Swedish e-Science Research Centre, Sweden.
    Assembly scaffolding with PE-contaminated mate-pair libraries2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 13, p. 1925-1932Article in journal (Refereed)
    Abstract [en]

    Motivation: Scaffolding is often an essential step in a genome assembly process, in which contigs are ordered and oriented using read pairs from a combination of paired-end libraries and longer-range mate-pair libraries. Although a simple idea, scaffolding is unfortunately hard to get right in practice. One source of problems is so-called PE-contamination in mate-pair libraries, in which a non-negligible fraction of the read pairs get the wrong orientation and a much smaller insert size than what is expected. This contamination has been discussed before, in relation to integrated scaffolders, but solutions rely on the orientation being observable, e.g. by finding the junction adapter sequence in the reads. This is not always possible, making orientation and insert size of a read pair stochastic. To our knowledge, there is neither previous work on modeling PE-contamination, nor a study on the effect PE-contamination has on scaffolding quality. Results: We have addressed PE-contamination in an update to our scaffolder BESST. We formulate the problem as an integer linear program which is solved using an efficient heuristic. The new method shows significant improvement over both integrated and stand-alone scaffolders in our experiments. The impact of modeling PE-contamination is quantified by comparing with the previous BESST model. We also show how other scaffolders are vulnerable to PE-contaminated libraries, resulting in an increased number of misassemblies, more conservative scaffolding and inflated assembly sizes.

  • 29. Sahlin, Kristoffer
    et al.
    Street, Nathaniel
    Lundeberg, Joakim
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Improved gap size estimation for scaffolding algorithms2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 17, p. 2215-2222Article in journal (Refereed)
    Abstract [en]

    Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.

    Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.

  • 30.
    Salvatore, Marco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Warholm, Per
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Basile, Walter
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    SubCons: a new ensemble method for improved human subcellular localization predictions2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 16, p. 2464-2470Article in journal (Refereed)
    Abstract [en]

    Motivation: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein. Unfortunately large-scale experimental studies are limited in their accuracy. Therefore, the development of prediction methods has been limited by the amount of accurate experimental data. However, recently large-scale experimental studies have provided new data that can be used to evaluate the accuracy of subcellular predictions in human cells. Using this data we examined the performance of state of the art methods and developed SubCons, an ensemble method that combines four predictors using a Random Forest classifier. Results: SubCons outperforms earlier methods in a dataset of proteins where two independent methods confirm the subcellular localization. Given nine subcellular localizations, SubCons achieves an F1-Score of 0.79 compared to 0.70 of the second bestmethod. Furthermore, at a FPR of 1% the true positive rate (TPR) is over 58% for SubCons compared to less than 50% for the best individual predictor.

  • 31.
    Saripella, Ganapathi Varma
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Forslund, Kristoffer
    Benchmarking the next generation of homology inference tools2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 17, p. 2636-2641Article in journal (Refereed)
    Abstract [en]

    Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the 'next generation' of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM+Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity.

  • 32.
    Shu, Nanjiang
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    KalignP: Improved multiple sequence alignments using position specific gap penalties in Kalign22011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 12, p. 1702-1703Article in journal (Refereed)
    Abstract [en]

    Kalign2 is one of the fastest and most accurate methods for multiple alignments. However, in contrast to other methods Kalign2 does not allow externally supplied position specific gap penalties. Here, we present a modification to Kalign2, KalignP, so that it accepts such penalties. Further, we show that KalignP using position specific gap penalties obtained from predicted secondary structures makes steady improvement over Kalign2 when tested on Balibase 3.0 as well as on a dataset derived from Pfam-A seed alignments.

  • 33.
    Shu, Nanjiang
    et al.
    Stockholm University, Faculty of Science, Department of Physical, Inorganic and Structural Chemistry, Structural Chemistry.
    Zhou, Tuping
    Stockholm University, Faculty of Science, Department of Physical, Inorganic and Structural Chemistry.
    Hovmöller, Sven
    Stockholm University, Faculty of Science, Department of Physical, Inorganic and Structural Chemistry.
    Prediction of zinc-binding sites in proteins from sequence2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 6, p. 775-782Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Motivated by the abundance, importance and unique functionality of zinc, both biologically and physiologically, we have developed an improved method for the prediction of zinc-binding sites in proteins from their amino acid sequences. RESULTS: By combining support vector machine (SVM) and homology-based predictions, our method predicts zinc-binding Cys, His, Asp and Glu with 75% precision (86% for Cys and His only) at 50% recall according to a 5-fold cross-validation on a non-redundant set of protein chains from the Protein Data Bank (PDB) (2727 chains, 235 of which bind zinc). Consequently, our method predicts zinc-binding Cys and His with 10% higher precision at different recall levels compared to a recently published method when tested on the same dataset. AVAILABILITY: The program is available for download at www.fos.su.se/~nanjiang/zincpred/download/

  • 34.
    Sjöstrand, Joel
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Sennblad, Bengt
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Lagergren, Jens
    DLRS: Gene tree evolution in light of a species tree2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 22, p. 2994-2995Article in journal (Refereed)
    Abstract [en]

    PrIME-DLRS (or colloquially: 'Delirious') is a phylogenetic software tool to simultaneously infer and reconcile a gene tree given a species tree. It accounts for duplication and loss events, a relaxed molecular clock and is intended for the study of homologous gene families, for example in a comparative genomics setting involving multiple species. PrIME-DLRS uses a Bayesian MCMC framework, where the input is a known species tree with divergence times and a multiple sequence alignment, and the output is a posterior distribution over gene trees and model parameters.

  • 35.
    Skwark, Marcin J.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Abdel-Rehim, Abbi
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    PconsC: combination of direct information methods and alignments improves contact prediction2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 14, p. 1815-1816Article in journal (Refereed)
    Abstract [en]

    Recently, several new contact prediction methods have been published. They use (i) large sets of multiple aligned sequences and (ii) assume that correlations between columns in these alignments can be the results of indirect interaction. These methods are clearly superior to earlier methods when it comes to predicting contacts in proteins. Here, we demonstrate that combining predictions from two prediction methods, PSICOV and plmDCA, and two alignment methods, HHblits and jackhmmer at four different e-value cut-offs, provides a relative improvement of 20% in comparison with the best single method, exceeding 70% correct predictions for one contact prediction per residue.

  • 36.
    Skwark, Marcin J.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    PconsD: Ultra rapid, accurate model quality assessment for protein structure prediction2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 14, p. 1817-1818Article in journal (Refereed)
    Abstract [en]

    Clustering methods are often needed for accurately assessing the quality of modeled protein structures. Recent blind evaluation of quality assessment methods in CASP10 showed that there is very little difference between many different methods as far as ranking models and selecting best model are concerned. When comparing many models the computational cost of the model comparison can become significant. Here, we present PconsD, a very fast, stream-computing method for distance-driven model quality assessment, that runs on consumer hardware. PconsD is at least one order of magnitude faster than other methods of comparable accuracy.

  • 37.
    Studham, Matthew E.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nordling, Torbjörn E. M.
    Nelander, Sven
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Functional association networks as priors for gene regulatory network inference2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 12, p. 130-138Article in journal (Refereed)
    Abstract [en]

    Motivation: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. Results: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data.

  • 38.
    Tsirigos, Konstantinos D.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Thessaly, Greece .
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bagos, Pantelis G.
    PRED-TMBB2: improved topology prediction and detection of beta-barrel outer membrane proteins2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 17, p. 665-671Article in journal (Refereed)
    Abstract [en]

    Motivation: The PRED-TMBB method is based on Hidden Markov Models and is capable of predicting the topology of beta-barrel outer membrane proteins and discriminate them from water-soluble ones. Here, we present an updated version of the method, PRED-TMBB2, with several newly developed features that improve its performance. The inclusion of a properly defined end state allows for better modeling of the beta-barrel domain, while different emission probabilities for the adjacent residues in strands are used to incorporate knowledge concerning the asymmetric amino acid distribution occurring there. Furthermore, the training was performed using newly developed algorithms in order to optimize the labels of the training sequences. Moreover, the method is retrained on a larger, non-redundant dataset which includes recently solved structures, and a newly developed decoding method was added to the already available options. Finally, the method now allows the incorporation of evolutionary information in the form of multiple sequence alignments. Results: The results of a strict cross-validation procedure show that PRED-TMBB2 with homology information performs significantly better compared to other available prediction methods. It yields 76% in correct topology predictions and outperforms the best available predictor by 7%, with an overall SOV of 0.9. Regarding detection of beta-barrel proteins, PRED-TMBB2, using just the query sequence as input, achieves an MCC value of 0.92, outperforming even predictors designed for this task and are much slower.

  • 39.
    Uziela, Karolis
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Menéndez Hurtado, David
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Wallner, Björn
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    ProQ3D: improved model quality assessments using deep learning2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 10, p. 1578-1580Article in journal (Refereed)
    Abstract [en]

    Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network. This improves the Pearson correlation to 0.90 (0.85 using ProQ2 input features).

  • 40.
    Uziela, Karolis
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Wallner, Björn
    ProQ2: estimation of model accuracy implemented in Rosetta2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 9, p. 1411-1413Article in journal (Refereed)
    Abstract [en]

    Motivation: Model quality assessment programs are used to predict the quality of modeled protein structures. They can be divided into two groups depending on the information they are using: ensemble methods using consensus of many alternative models and methods only using a single model to do its prediction. The consensus methods excel in achieving high correlations between prediction and true quality measures. However, they frequently fail to pick out the best possible model, nor can they be used to generate and score new structures. Single-model methods on the other hand do not have these inherent shortcomings and can be used both to sample new structures and to improve existing consensus methods. Results: Here, we present an implementation of the ProQ2 program to estimate both local and global model accuracy as part of the Rosetta modeling suite. The current implementation does not only make it possible to run large batch runs locally, but it also opens up a whole new arena for conformational sampling using machine learned scoring functions and to incorporate model accuracy estimation in to various existing modeling schemes. ProQ2 participated in CASP11 and results from CASP11 are used to benchmark the current implementation. Based on results from CASP11 and CAMEO-QE, a continuous benchmark of quality estimation methods, it is clear that ProQ2 is the single-model method that performs best in both local and global model accuracy.

  • 41.
    Zhou, Tuping
    et al.
    Stockholm University, Faculty of Science, Department of Materials and Environmental Chemistry (MMK).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Materials and Environmental Chemistry (MMK).
    Hovmöller, Sven
    Stockholm University, Faculty of Science, Department of Materials and Environmental Chemistry (MMK).
    A Novel Method for Accurate One-dimensional Protein Structure Prediction Based on Fragment Matching2010In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 26, no 4, p. 470-477Article in journal (Refereed)
    Abstract [en]

    Motivation: The precise prediction of one-dimensional (1D) protein structure as represented by the protein secondary structure and 1D string of discrete state of dihedral angles (i.e. Shape Strings) is a prerequisite for the successful prediction of three-dimensional (3D) structure as well as protein-protein interaction. We have developed a novel 1D structure prediction method, called Frag1D, based on a straightforward fragment matching algorithm and demonstrated its success in the prediction of  three sets of 1D structural alphabets, i.e. the classical three-state secondary structure, three-state Shape Strings and eight-state Shape Strings.

    Results: By exploiting the vast protein sequence and protein structure data available, we have brought secondary structure prediction closer to the expected theoretical limit. When tested by a leave-one-out cross validation on a non-redundant set of PDB cutting at 30% sequence identity containing 5860 protein chains, the overall per-residue accuracy for secondary structure prediction, i.e. Q3 is 82.9%. The overall per-residue accuracy for three-state and eight-state Shape Strings are 85.1% and 71.5% respectively. We have also benchmarked our program with the latest version of PSIPRED for secondary structure prediction and our program predicted 0.3% better in Q3 when tested on 2241 chains with the same training set. For Shape Strings, we compared our method with a recently published method with the same dataset and definition as used by that method. Our program predicted at 2.2% better in accuracy for three-state Shape Strings. By quantitatively investigating the effect of data base size on 1D structure prediction we show that the accuracy increases by about 1% with every doubling of the database size.

1 - 41 of 41
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf