Change search
Refine search result
12 1 - 50 of 53
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dmitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Comparative interactomics with Funcoup 2.02012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no D1, D821-D828 p.Article in journal (Refereed)
    Abstract [en]

    FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.

  • 2.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Global networks of functional coupling in eukaryotes from comprehensive data integration2009In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 19, no 6, 1107-16 p.Article in journal (Refereed)
    Abstract [en]

    No single experimental method can discover all connections in the interactome. A computational approach can help by integrating data from multiple, often unrelated, proteomics and genomics pipelines. Reconstructing global networks of functional coupling (FC) faces the challenges of scale and heterogeneity--how to efficiently integrate huge amounts of diverse data from multiple organisms, yet ensuring high accuracy. We developed FunCoup, an optimized Bayesian framework, to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions: physical interaction, protein complex member, metabolic, or signaling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimize the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 data sets in seven organisms and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network, only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. We show how FunCoup can be used for discovering candidate members of the Parkinson and Alzheimer pathways. Cross-species pathway conservation analysis provided further support to these observations.

  • 3.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Wassenberg, Deena M.
    Lobenhofer, Edward K.
    Yen, Jerry
    Linney, Elwood
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Meyer, Joel N.
    Dynamic Zebrafish Interactome Reveals Transcriptional Mechanisms of Dioxin Toxicity2010In: PLOS ONE, ISSN 1932-6203, Vol. 5, no 5, e10465- p.Article in journal (Refereed)
    Abstract [en]

    Background: In order to generate hypotheses regarding the mechanisms by which 2,3,7,8-tetrachlorodibenzo-p-dioxin (dioxin) causes toxicity, we analyzed global gene expression changes in developing zebrafish embryos exposed to this potent toxicant in the context of a dynamic gene network. For this purpose, we also computationally inferred a zebrafish (Danio rerio) interactome based on orthologs and interaction data from other eukaryotes. Methodology/Principal Findings: Using novel computational tools to analyze this interactome, we distinguished between dioxin-dependent and dioxin-independent interactions between proteins, and tracked the temporal propagation of dioxin-dependent transcriptional changes from a few genes that were altered initially, to large groups of biologically coherent genes at later times. The most notable processes altered at later developmental stages were calcium and iron metabolism, embryonic morphogenesis including neuronal and retinal development, a variety of mitochondria-related functions, and generalized stress response (not including induction of antioxidant genes). Within the interactome, many of these responses were connected to cytochrome P4501A (cyp1a) as well as other genes that were dioxin-regulated one day after exposure. This suggests that cyp1a may play a key role initiating the toxic dysregulation of those processes, rather than serving simply as a passive marker of dioxin exposure, as suggested by earlier research. Conclusions/Significance: Thus, a powerful microarray experiment coupled with a flexible interactome and multi-pronged interactome tools (which are now made publicly available for microarray analysis and related work) suggest the hypothesis that dioxin, best known in fish as a potent cardioteratogen, has many other targets. Many of these types of toxicity have been observed in mammalian species and are potentially caused by alterations to cyp1a.

  • 4. Altenhoff, Adrian M.
    et al.
    Boeckmann, Brigitte
    Capella-Gutierrez, Salvador
    Dalquen, Daniel A.
    DeLuca, Todd
    Forslund, Kristoffer
    Huerta-Cepas, Jaime
    Linard, Benjamin
    Pereira, Cecile
    Pryszcz, Leszek P.
    Schreiber, Fabian
    da Silva, Alan Sousa
    Szklarczyk, Damian
    Train, Clement-Marie
    Bork, Peer
    Lecompte, Odile
    von Mering, Christian
    Xenarios, Ioannis
    Sjölander, Kimmen
    Juhl Jensen, Lars
    Martin, Maria J.
    Muffato, Matthieu
    Gabaldon, Toni
    Lewis, Suzanna E.
    Thomas, Paul D.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Dessimoz, Christophe
    Standardized benchmarking in the quest for orthologs2016In: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 13, no 5, 425-+ p.Article in journal (Refereed)
    Abstract [en]

    Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.

  • 5. Berglund, Ann-Charlotte
    et al.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Östlund, Gabriel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    InParanoid 6: eukaryotic ortholog clusters with inparalogs2008In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 36, D263-D266 p.Article in journal (Refereed)
    Abstract [en]

    The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.

  • 6.
    Björkholm, Patrik
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Comparative analysis and unification of domain-domain interaction networks2009In: Bioinformatics (Oxford, England), ISSN 1367-4811, Vol. 25, no 22, 3020-5 p.Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Certain protein domains are known to preferentially interact with other domains. Several approaches have been proposed to predict domain-domain interactions, and over nine datasets are available. Our aim is to analyse the coverage and quality of the existing resources, as well as the extent of their overlap. With this knowledge, we have the opportunity to merge individual domain interaction networks to construct a comprehensive and reliable database. RESULTS: In this article we introduce a new approach towards comparing domain-domain interaction networks. This approach is used to compare nine predicted domain and protein interaction networks. The networks were used to generate a database of unified domain interactions, UniDomInt. Each interaction in the dataset is scored according to the benchmarked reliability of the sources. The performance of UniDomInt is an improvement compared to the underlying source networks and to another composite resource, Domine. AVAILABILITY: http://sonnhammer.sbc.su.se/download/UniDomInt/

  • 7. Dessimoz, Christophe
    et al.
    Gabaldón, Toni
    Roos, David S
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Herrero, Javier
    Toward community standards in the quest for orthologs.2012In: Bioinformatics (Oxford, England), ISSN 1367-4811, Vol. 28, no 6, 900-4 p.Article in journal (Refereed)
    Abstract [en]

    The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.

  • 8. Finn, Robert D.
    et al.
    Bateman, Alex
    Clements, Jody
    Coggill, Penelope
    Eberhardt, Ruth Y.
    Eddy, Sean R.
    Heger, Andreas
    Hetherington, Kirstie
    Holm, Liisa
    Mistry, Jaina
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tate, John
    Punta, Marco
    Pfam: the protein families database2014In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 42, no D1, d222-D230 p.Article in journal (Refereed)
    Abstract [en]

    Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

  • 9. Finn, Robert D.
    et al.
    Mistry, Jaina
    Tate, John
    Coggill, Penny
    Heger, Andreas
    Pollington, Joanne E.
    Gavin, O. Luke
    Gunasekaran, Prasad
    Ceric, Goran
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Holm, Liisa
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Eddy, Sean R.
    Bateman, Alex
    The Pfam protein families database2010In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 38, d211-d222 p.Article in journal (Refereed)
    Abstract [en]

    Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is similar to 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

  • 10.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Henricson, Anna
    Hollich, Volker
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain tree-based analysis of protein architecture evolution2008In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 25, no 2, 254-264 p.Article in journal (Refereed)
    Abstract [en]

    Understanding the dynamics behind domain architecture evolution is of great importance to unravel the functions of proteins. Complex architectures have been created throughout evolution by rearrangement and duplication events. An interesting question is how many times a particular architecture has been created, a form of convergent evolution or domain architecture reinvention. Previous studies have approached this issue by comparing architectures found in different species. We wanted to achieve a finer-grained analysis by reconstructing protein architectures on complete domain trees. The prevalence of domain architecture reinvention in 96 genomes was investigated with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. Domain architectures were taken from Pfam. To ensure robustness, we applied the method to bootstrap trees and only considered results with strong statistical support. We detected multiple origins for 12.4% of the scored architectures. In a much smaller data set, the subset of completely domain-assigned proteins, the figure was 5.6%. These results indicate that domain architecture reinvention is a much more common phenomenon than previously thought. We also determined which domains are most frequent in multiply created architectures and assessed whether specific functions could be attributed to them. However, no strong functional bias was found in architectures with multiple origins.

  • 11.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Pekkari, Isabella
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain architecture conservation in orthologs2011In: BMC Bioinformatics, ISSN 1471-2105, Vol. 12, 326- p.Article in journal (Refereed)
    Abstract [en]

    Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.

    Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.

    Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

  • 12.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Schreiber, Fabian
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Thanintorn, Nattaphon
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    OrthoDisease: tracking disease gene orthologs across 100 species2011In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 12, no 5, 463-473 p.Article in journal (Refereed)
    Abstract [en]

    Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases. We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version. Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups.

  • 13.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Swedish e-Science Research Center .
    Evolution of Protein Domain Architectures2012In: Evolutionary Genomics: Statistical and Computational Methods, Vol 2 / [ed] Anisimova, M, Totowa, NJ: Humana Press, 2012, 187-216 p.Chapter in book (Refereed)
    Abstract [en]

    This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions arc reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).

  • 14.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Benchmarking homology detection procedures with low complexity filters2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 25, no 19, 2500-2505 p.Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.

    RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.

    CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.

    AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

  • 15.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting protein function from domain content2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 24, no 15, 1681-1687 p.Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.

    RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.

    AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar

  • 16.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    MGclus: network clustering employing shared neighbors2013In: Molecular BioSystems, ISSN 1742-206X, Vol. 9, no 7, 1670-1675 p.Article in journal (Refereed)
    Abstract [en]

    Network analysis is an important tool for functional annotation of genes and proteins. A common approach to discern structure in a global network is to infer network clusters, or modules, and assume a functional coherence within each module, which may represent a complex or a pathway. It is however not trivial to define optimal modules. Although many methods have been proposed, it is unclear which methods perform best in general. It seems that most methods produce far from optimal results but in different ways. MGclus is a new algorithm designed to detect modules with a strongly interconnected neighborhood in large scale biological interaction networks. In our benchmarks we found MGclus to outperform other methods when applied to random graphs with varying degree of noise, and to perform equally or better when applied to biological protein interaction networks. MGclus is implemented in Java and utilizes the JGraphT graph library. It has an easy to use command-line interface and is available for download from http://sonnhammer.sbc.su.se/download/software/MGclus/.

  • 17.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Srinivas
    Bergh, Jonas
    Ostman, Arne
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish Escience Research Center, Sweden.
    Prognostic Significance in Breast Cancer of a Gene Signature Capturing Stromal PDGF Signaling2013In: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191, Vol. 182, no 6, 2037-2047 p.Article in journal (Refereed)
    Abstract [en]

    In this study, we describe a novel gene expression signature of platelet-derived growth factor (PDGF) activated fibroblasts, which is able to identify breast cancers with a PDGF-stimulated fibroblast stroma and displays an independent and strong prognostic significance. Global gene expression was compared between PDGF-stimulated human fibroblasts and cultured resting fibroblasts. The most differentially expressed genes were reduced to a gene expression signature of 113 genes. The biological significance and prognostic capacity of this signature were investigated using four independent clinical breast cancer data sets. Concomitant high expression of PDGF beta receptor and its cognate Ligands is associated with a high PDGF signature score. This supports the notion that the signature detects tumors with PDGF-activated stroma. Subsequent analyses indicated significant associations between high PDGF signature score and clinical characteristics, including human epidermal growth factor receptor 2 positivity, estrogen receptor negativity, high tumor grade, and large tumor size. A high PDGF signature score is associated with shorter survival in univariate analysis. Furthermore, the high PDGF signature score acts as a significant marker of poor prognosis in multivariate survival analyses, including classic prognostic markers, Ki-67 status, a proliferation gene signature, or other recently described stroma-derived gene expression signatures.

  • 18.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Sunny
    Bergh, Jonas
    Östman, Arne
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Prognostic significance in breast cancer of a gene signature capturing stromal PDGF signalingIn: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191Article in journal (Refereed)
  • 19.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Mank, Judith E.
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Network Analysis of Functional Genomics Data: Application to Avian Sex-Biased Gene Expression2012In: Scientific World Journal, ISSN 1537-744X, 130491- p.Article in journal (Refereed)
    Abstract [en]

    Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.

  • 20. Gabaldón, Toni
    et al.
    Dessimoz, Christophe
    Huxley-Jones, Julie
    Vilella, Albert J
    Sonnhammer, Erik Ll
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lewis, Suzanna
    Joining forces in the quest for orthologs2009In: Genome biology, ISSN 1465-6914, Vol. 10, no 9, 403- p.Article in journal (Refereed)
    Abstract [en]

    Better orthology-prediction resources would be beneficial for the whole biological community. A recent meeting discussed how to coordinate and leverage current efforts.

  • 21.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bernhem, Kristoffer
    Ait Blal, Hammou
    Lundberg, Emma
    Brismar, Hjalmar
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Experimental validation of predicted cancer genes using FRETManuscript (preprint) (Other academic)
  • 22.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden; Swedish eScience Research Center, Sweden.
    MaxLink: network-based prioritization of genes tightly linked to a disease seed set2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, 2689-2690 p.Article in journal (Refereed)
    Abstract [en]

    A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.

  • 23.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A large-scale benchmark of gene prioritization methods2017In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 7, 46598Article in journal (Refereed)
    Abstract [en]

    In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology (GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

  • 24.
    Haider, Christian
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Kavic, Marina
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    TreeDom: a graphical web tool for analysing domain architecture evolution2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 15, 2384-2385 p.Article in journal (Refereed)
    Abstract [en]

    We present TreeDom, a web tool for graphically analysing the evolutionary history of domains in multi-domain proteins. Individual domains on the same protein chain may have distinct evolutionary histories, which is important to grasp in order to understand protein function. For instance, it may be important to know whether a domain was duplicated recently or long ago, to know the origin of inserted domains, or to know the pattern of domain loss within a protein family. TreeDom uses the Pfam database as the source of domain annotations, and displays these on a sequence tree. An advantage of TreeDom is that the user can limit the analysis to N sequences that are most similar to a query, or provide a list of sequence IDs to include. Using the Pfam alignment of the selected sequences, a tree is built and displayed together with the domain architecture of each sequence.

  • 25.
    Henricson, Anna
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Orthology confers intron position conservation2010In: BMC Genomics, ISSN 1471-2164, Vol. 11:412Article in journal (Refereed)
    Abstract [en]

    Background: With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence.

    Results: To this end, we developed a new score for IPC and applied it to ortholog groups between human and six other species. For comparison, we also gathered the closest non-orthologs, meaning sequences close in sequence space, yet falling just outside the ortholog cluster. We found that ortholog-ortholog gene pairs on average have a significantly higher degree of IPC compared to ortholog-closest non-ortholog pairs. Also pairs of inparalogs were found to have a higher IPC score than inparalog-closest non-inparalog pairs. We verified that these differences can not simply be attributed to the generally higher sequence identity of the ortholog-ortholog and the inparalog-inparalog pairs. Furthermore, we analyzed the agreement between IPC score and the ortholog score assigned by the InParanoid algorithm, and found that it was consistently high for all species comparisons. In a minority of cases, the IPC and InParanoid score ranked inparalogs differently. These represent cases where sequence and intron position divergence are discordant. We further analyzed the discordant clusters to identify any possible preference for protein functions by looking for enriched GO terms and Pfam protein domains. They were enriched for functions important for multicellularity, which implies a connection between shifts in intronic structure and the origin of multicellularity.

    Conclusions: We conclude that orthologous genes tend to have more conserved intron positions compared to non-orthologous genes. As a consequence, our IPC score is useful as an additional discriminating factor when assigning orthology.

  • 26. Hollich, Volker
    et al.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    PfamAlyzer: domain-centric homology search2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 23, no 24, 3382-3383 p.Article in journal (Refereed)
    Abstract [en]

    PfamAlyzer is a Java applet that enables exploration of Pfam domain architectures using a user-friendly graphical interface. It can search the UniProt protein database for a domain pattern. Domain patterns similar to the query are presented graphically by PfamAlyzer either in a ranked list or pinned to the tree of life. Such domain-centric homology search can assist identification of distant homologs with shared domain architecture.

  • 27.
    Kaduk, Mateusz
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Riegler, Christian
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). FH OÖ - University of Applied Sciences Upper Austria, Austria.
    Lemp, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). FH OÖ - University of Applied Sciences Upper Austria, Austria.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    HieranoiDB: a database of orthologs inferred by Hieranoid2017In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 45, no D1, D687-D690 p.Article in journal (Refereed)
    Abstract [en]

    HieranoiDB (http://hieranoiDB.sbc.su.se) is a freely available on-line database for hierarchical groups of orthologs inferred by the Hieranoid algorithm. It infers orthologs at each node in a species guide tree with the InParanoid algorithm as it progresses from the leaves to the root. Here we present a database HieranoiDB with a web interface that makes it easy to search and visualize the output of Hieranoid, and to download it in various formats. Searching can be performed using protein description, identifier or sequence. In this first version, orthologs are available for the 66 Quest for Orthologs reference proteomes. The ortholog trees are shown graphically and interactively with marked speciation and duplication nodes that show the inferred evolutionary scenario, and allow for correct extraction of predicted orthologs from the Hieranoid trees.

  • 28.
    Kaduk, Mateusz
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Improved orthology inference with Hieranoid 22017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 8, 1154-1159 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: The initial step in many orthology inference methods is the computationally demanding establishment of all pairwise protein similarities across all analysed proteomes. The quadratic scaling with proteomes has become a major bottleneck. A remedy is offered by the Hieranoid algorithm which reduces the complexity to linear by hierarchically aggregating ortholog groups from InParanoid along a species tree. Results: We have further developed the Hieranoid algorithm in many ways. Major improvements have been made to the construction of multiple sequence alignments and consensus sequences. Hieranoid version 2 was evaluated with standard benchmarks that reveal a dramatic increase in the coverage/accuracy tradeoff over version 1, such that it now compares favourably with the best methods. The new parallelized cluster mode allows Hieranoid to be run on large data sets in a much shorter timespan than InParanoid, yet at similar accuracy.

  • 29.
    Klammer, Martin
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Messina, David N.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    MetaTM - a consensus method for transmembrane protein topology prediction2009In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 314- p.Article in journal (Refereed)
    Abstract [en]

    Transmembrane (TM) proteins are proteins that span a biological membrane one or more times. As their 3-D structures are hard to determine, experiments focus on identifying their topology (i. e. which parts of the amino acid sequence are buried in the membrane and which are located on either side of the membrane), but only a few topologies are known. Consequently, various computational TM topology predictors have been developed, but their accuracies are far from perfect. The prediction quality can be improved by applying a consensus approach, which combines results of several predictors to yield a more reliable result. RESULTS: A novel TM consensus method, named MetaTM, is proposed in this work. MetaTM is based on support vector machine models and combines the results of six TM topology predictors and two signal peptide predictors. On a large data set comprising 1460 sequences of TM proteins with known topologies and 2362 globular protein sequences it correctly predicts 86.7% of all topologies. CONCLUSION: Combining several TM predictors in a consensus prediction framework improves overall accuracy compared to any of the individual methods. Our proposed SVM-based system also has higher accuracy than a previous consensus predictor. MetaTM is made available both as downloadable source code and as DAS server at http://MetaTM.sbc.su.se.

  • 30.
    Kutsenko, Alexey
    et al.
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute. Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Svensson, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nystedt, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Uppsala University, Sweden.
    Lundeberg, Joakim
    Björk, Petra
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Giacomello, Stefania
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Visa, Neus
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    Wieslander, Lars
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    The Chironomus tentans genome sequence and the organization of the Balbiani ring genes2014In: BMC Genomics, ISSN 1471-2164, Vol. 15, 819- p.Article in journal (Refereed)
    Abstract [en]

    Background: The polytene nuclei of the dipteran Chironomus tentans (Ch. tentans) with their Balbiani ring (BR) genes constitute an exceptional model system for studies of the expression of endogenous eukaryotic genes. Here, we report the first draft genome of Ch. tentans and characterize its gene expression machineries and genomic architecture of the BR genes. Results: The genome of Ch. tentans is approximately 200 Mb in size, and has a low GC content (31%) and a low repeat fraction (15%) compared to other Dipteran species. Phylogenetic inference revealed that Ch. tentans is a sister clade to mosquitoes, with a split 150-250 million years ago. To characterize the Ch. tentans gene expression machineries, we identified potential orthologus sequences to more than 600 Drosophila melanogaster (D. melanogaster) proteins involved in the expression of protein-coding genes. We report novel data on the organization of the BR gene loci, including a novel putative BR gene, and we present a model for the organization of chromatin bundles in the BR2 puff based on genic and intergenic in situ hybridizations. Conclusions: We show that the molecular machineries operating in gene expression are largely conserved between Ch. tentans and D. melanogaster, and we provide enhanced insight into the organization and expression of the BR genes. Our data strengthen the generality of the BR genes as a unique model system and provide essential background for in-depth studies of the biogenesis of messenger ribonucleoprotein complexes.

  • 31. Lassmann, Timo
    et al.
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features.2009In: Nucleic acids research, ISSN 1362-4962, Vol. 37, no 3, 858-65 p.Article in journal (Refereed)
    Abstract [en]

    In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site (http://msa.sbc.su.se/).

  • 32. Lindberg, Julia
    et al.
    Alexeyenko, Andrey
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Perez-Bercoff, Åsa
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Overview and comparison of ortholog databases2006In: Drug Discovery Today: Technologies, ISSN 1740-6749, Vol. 3, no 2Article in journal (Refereed)
    Abstract [sv]

    Orthologs are an indispensable bridge to transfer biological knowledge between species, from protein annotations to sophisticated disease models. However, orthology assignment is not trivial. A large number of resources now exist, each with its own idiosyncrasies. The goal of this review is to compare their contents and clarify which database is most suited for a certain task.

  • 33.
    McCormack, Theodore
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Statistical Assessment of Crosstalk Enrichment between Gene Groups in Biological Networks2013In: PLoS ONE, ISSN 1932-6203, Vol. 8, no 1, e54945- p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results: Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation: CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.

  • 34.
    Messina, David
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lysholm, Fredrik
    Department of Cell and Molecular Biology, Karolinska Institutet.
    Allander, Tobias
    Department of Microbiology, Tumor- and Cell Biology, Karolinska Institutet.
    Andersson, Björn
    Department of Cell and Molecular Biology, Karolinska Institutet.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Discovery of novel protein families in metagenomic samplesManuscript (preprint) (Other academic)
    Abstract [en]

    Despite the steady rise in gene sequence information, there is a persistent, significant fraction of genes which do not match any previously known sequence. These genes are called ORFans, and metagenomic samples, where DNA is extracted from a mixed population of unknown and often uncultivable species, are a rich source of ORFans. Viral infections cause significant morbidity and mortality, and identifying ORFan viral gene families from human metagenomic samples represents a route to understanding molecular processes that affect human health. Few methods exist for metagenomic gene-finding, and most of them rely on sequence similarity, which cannot be used to detect ORFans. Furthermore, nonsimilarity-based methods are hard to apply to the complex mixture of short, higherror-rate sequence fragments which are typical of metagenomic projects. Here we present an approach to detect ORFan protein families in short-read data, and apply it to 937 Mbp (megabase pairs) of sequence from 17 virus-enriched libraries made from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. After isolating approximately 450 putative ORFan families from clusters of sequence contigs, we applied RNAcode, a gene finder developed for use on high-quality genome sequences, and calibrated it for errorprone short sequence reads. Additional predictive measures such as sequence complexity and length were then used to rank and filter candidates into a high-quality set of 32 putative novel gene families, only two of which show significant similarity to known genes.

  • 35.
    Messina, David N.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    DASher: a stand-alone protein sequence client for DAS, the Distributed Annotation System2009In: Bioinformatics, ISSN 1367-4803, Vol. 25, no 10, 1333-1334 p.Article in journal (Refereed)
    Abstract [en]

    The rise in biological sequence data has led to a proliferation of separate, specialized databases. While there is great value in having many independent annotations, it is critical that there be a way to integrate them in one combined view. The Distributed Annotation System (DAS) was developed for that very purpose. There are currently no DAS clients that are open source, specialized for aggregating and comparing protein sequence annotation, and that can run as a self-contained application outside of a web browser. The speed, flexibility and extensibility that come with a stand-alone application motivated us to create DASher, an open-source Java DAS client. Given a UniProt sequence identifier, DASher automatically queries DAS-supporting servers worldwide for any information on that sequence and then displays the annotations in an interactive viewer for easy comparison. DASher is a fast, Java-based DAS client optimized for viewing protein sequence annotation and compliant with the latest DAS protocol specification 1.53E. AVAILABILITY: DASher is available for direct use and download at http://dasher.sbc.su.se including examples and source code under the GPLv3 licence. Java version 6 or higher is required.

  • 36.
    Ogris, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Helleday, Thomas
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation2017In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 45, no 2, e8Article in journal (Refereed)
    Abstract [en]

    Analyzing gene expression patterns is a mainstay to gain functional insights of biological systems. A plethora of tools exist to identify significant enrichment of pathways for a set of differentially expressed genes. Most tools analyze gene overlap between gene sets and are therefore severely hampered by the current state of pathway annotation, yet at the same time they run a high risk of false assignments. A way to improve both true positive and false positive rates (FPRs) is to use a functional association network and instead look for enrichment of network connections between gene sets. We present a new network crosstalk analysis method BinoX that determines the statistical significance of network link enrichment or depletion between gene sets, using the binomial distribution. This is a much more appropriate statistical model than previous methods have employed, and as a result BinoX yields substantially better true positive and FPRs than was possible before. A number of benchmarks were performed to assess the accuracy of BinoX and competing methods. We demonstrate examples of how BinoX finds many biologically meaningful pathway annotations for gene sets from cancer and other diseases, which are not found by other methods.

  • 37.
    Ogris, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kaduk, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    FunCoup 4: New species, data, and visualizationManuscript (preprint) (Other academic)
  • 38.
    Ogris, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Helleday, Thomas
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    PathwAX: a web server for network crosstalk based pathway annotation2016In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 44, no W1, W105-W109 p.Article in journal (Refereed)
    Abstract [en]

    Pathway annotation of gene lists is often used to functionally analyse biomolecular data such as gene expression in order to establish which processes are activated in a given experiment. Databases such as KEGG or GO represent collections of how genes are known to be organized in pathways, and the challenge is to compare a given gene list with the known pathways such that all true relations are identified. Most tools apply statistical measures to the gene overlap between the gene list and pathway. It is however problematic to avoid false negatives and false positives when only using the gene overlap. The pathwAX web server (http://pathwAX.sbc.su.se/) applies a different approach which is based on network crosstalk. It uses the comprehensive network FunCoup to analyse network crosstalk between a query gene list and KEGG pathways. PathwAX runs the BinoX algorithm, which employs Monte-Carlo sampling of randomized networks and estimates a binomial distribution, for estimating the statistical significance of the crosstalk. This results in substantially higher accuracy than gene overlap methods. The system was optimized for speed and allows interactive web usage. We illustrate the usage and output of pathwAX.

  • 39. Punta, Marco
    et al.
    Coggill, Penny C.
    Eberhardt, Ruth Y.
    Mistry, Jaina
    Tate, John
    Boursnell, Chris
    Pang, Ningze
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Ceric, Goran
    Clements, Jody
    Heger, Andreas
    Holm, Liisa
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Eddy, Sean R.
    Bateman, Alex
    Finn, Robert D.
    The Pfam protein families database2012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no D1, D290-D301 p.Article in journal (Refereed)
    Abstract [en]

    Pfam is a widely used database of protein families, currently containing more than 13 000 manually curated protein families as of release 26.0. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/). Here, we report on changes that have occurred since our 2010 NAR paper (release 24.0). Over the last 2 years, we have generated 1840 new families and increased coverage of the UniProt Knowledgebase (UniProtKB) to nearly 80%. Notably, we have taken the step of opening up the annotation of our families to the Wikipedia community, by linking Pfam families to relevant Wikipedia pages and encouraging the Pfam and Wikipedia communities to improve and expand those pages. We continue to improve the Pfam website and add new visualizations, such as the 'sunburst' representation of taxonomic distribution of families. In this work we additionally address two topics that will be of particular interest to the Pfam community. First, we explain the definition and use of family-specific, manually curated gathering thresholds. Second, we discuss some of the features of domains of unknown function (also known as DUFs), which constitute a rapidly growing class of families within Pfam.

  • 40.
    Saripella, Ganapathi Varma
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Forslund, Kristoffer
    Benchmarking the next generation of homology inference tools2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 17, 2636-2641 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Over the last decades, vast numbers of sequences were deposited in public databases. Bioinformatics tools allow homology and consequently functional inference for these sequences. New profile-based homology search tools have been introduced, allowing reliable detection of remote homologs, but have not been systematically benchmarked. To provide such a comparison, which can guide bioinformatics workflows, we extend and apply our previously developed benchmark approach to evaluate the 'next generation' of profile-based approaches, including CS-BLAST, HHSEARCH and PHMMER, in comparison with the non-profile based search tools NCBI-BLAST, USEARCH, UBLAST and FASTA. Method: We generated challenging benchmark datasets based on protein domain architectures within either the PFAM+Clan, SCOP/Superfamily or CATH/Gene3D domain definition schemes. From each dataset, homologous and non-homologous protein pairs were aligned using each tool, and standard performance metrics calculated. We further measured congruence of domain architecture assignments in the three domain databases. Results: CSBLAST and PHMMER had overall highest accuracy. FASTA, UBLAST and USEARCH showed large trade-offs of accuracy for speed optimization. Conclusion: Profile methods are superior at inferring remote homologs but the difference in accuracy between methods is relatively small. PHMMER and CSBLAST stand out with the highest accuracy, yet still at a reasonable computational cost. Additionally, we show that less than 0.1% of Swiss-Prot protein pairs considered homologous by one database are considered non-homologous by another, implying that these classifications represent equivalent underlying biological phenomena, differing mostly in coverage and granularity.

  • 41.
    Schmitt, Thomas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Messina, David N.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Schreiber, Fabian
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information2011In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 12, no 5, 485-488 p.Article in journal (Refereed)
    Abstract [en]

    There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e. g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

  • 42.
    Schmitt, Thomas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Ogris, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    FunCoup 3.0: database of genome-wide functional coupling networks2014In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 42, no D1, D380-D388 p.Article in journal (Refereed)
    Abstract [en]

    We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.

  • 43.
    Schreiber, Fabian
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish E Sci Res Ctr, Stockholm, Sweden.
    Hieranoid: Hierarchical Orthology Inference2013In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 425, no 11, 2072-2081 p.Article in journal (Refereed)
    Abstract [en]

    An accurate inference of orthologs is essential in many research fields such as comparative genomics, molecular evolution, and genome annotation. Existing methods for genome-scale orthology inference are mostly based on all-versus-all similarity searches that scale quadratically with the number of species. This limits their application to the increasing number of available large-scale datasets. Here, we present Hieranoid, a new orthology inference method using a hierarchical approach. Hieranoid performs pairwise orthology analysis using In Paranoid at each node in a guide tree as it progresses from its leaves to the root. This concept reduces the total runtime complexity from a quadratic to a linear function of the number of species. The tree hierarchy provides a natural structure in multi-species ortholog groups, and the aggregation of multiple sequences allows for multiple alignment similarity searching techniques, which can yield more accurate ortholog groups. Using the recently published orthobench benchmark, Hieranoid showed the overall best performance. Our progressive approach presents a new way to infer orthologs that combines efficient graph-based methodology with aspects of compute-intensive tree-based methods. The linear scaling with the number of species is a major advantage for large-scale applications and makes Hieranoid well suited to cope with vast amounts of sequenced genomes in the future. Hieranoid is an open source and can be downloaded at Hieranoid.sbc.su.se.

  • 44.
    Sonnhammer, Erik L. L.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Östlund, Gabriel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    InParanoid 8: orthology analysis between 273 proteomes, mostly eukaryotic2015In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 43, no D1, D234-D239 p.Article in journal (Refereed)
    Abstract [en]

    The InParanoid database (http://InParanoid.sbc.su.se) provides a user interface to orthologs inferred by the InParanoid algorithm. As there are now international efforts to curate and standardize complete proteomes, we have switched to using these resources rather than gathering and curating the proteomes ourselves. InParanoid release 8 is based on the 66 reference proteomes that the 'Quest for Orthologs' community has agreed on using, plus 207 additional proteomes from the UniProt complete proteomes-in total 273 species. These represent 246 eukaryotes, 20 bacteria and seven archaea. Compared to the previous release, this increases the number of species by 173% and the number of pairwise species comparisons by 650%. In turn, the number of ortholog groups has increased by 423%. We present the contents and usages of InParanoid 8, and a detailed analysis of how the proteome content has changed since the previous release.

  • 45.
    Studham, Matthew E.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nordling, Torbjörn E. M.
    Nelander, Sven
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Functional association networks as priors for gene regulatory network inference2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 12, 130-138 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. Results: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data.

  • 46.
    Tjarnberg, Andreas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Linköping University, Sweden.
    Morgan, Daniel C.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Studham, Matthew
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nordling, Torbjorn E. M.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    GeneSPIDER - gene regulatory network inference benchmarking with controlled network and data properties2017In: Molecular Biosystems, ISSN 1742-206X, E-ISSN 1742-2051, Vol. 13, no 7, 1304-1312 p.Article in journal (Refereed)
    Abstract [en]

    A key question in network inference, that has not been properly answered, is what accuracy can be expected for a given biological dataset and inference method. We present GeneSPIDER - a Matlab package for tuning, running, and evaluating inference algorithms that allows independent control of network and data properties to enable data-driven benchmarking. GeneSPIDER is uniquely suited to address this question by first extracting salient properties from the experimental data and then generating simulated networks and data that closely match these properties. It enables data-driven algorithm selection, estimation of inference accuracy from biological data, and a more multifaceted benchmarking. Included are generic pipelines for the design of perturbation experiments, bootstrapping, analysis of linear dependence, sample selection, scaling of SNR, and performance evaluation. With GeneSPIDER we aim to move the goal of network inference benchmarks from simple performance measurement to a deeper understanding of how the accuracy of an algorithm is determined by different combinations of network and data properties.

  • 47.
    Tjärnberg, Andreas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nordling, Torbjörn E. M.
    Morgan, Daniel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Studham, Matthew
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish eScience Research Center.
    GeneSPIDER - Generation and Simulation Package for Informative Data ExploRationManuscript (preprint) (Other academic)
    Abstract [en]

    A range of tools are available to model, simulate and analyze gene regulatory networks (GRNs). However, these tools provide limited ability to control network topology, system dynamics, design of experiments, data properties, or noise characteristics. Independent control of these properties is the key to drawing conclusions on which inference method to use and what result to expect from it, as well as obtaining desired approximations of real biological systems. To draw conclusions on the relation between a network or data property and the performance of an inference method in simulations, system approximations with varying properties are needed. We present a Matlab package \gs for generation and analysis of networks and data in a dynamical systems framework with focus on the ability to vary properties. It supplies not only essential components that have been missing, but also wrappers to existing tools in common use. In particular, it contains tools for controlling and analyzing network topology (random, small-world, scale-free), stability of linear time-invariant systems, signal to noise ratio (SNR), and Interampatteness. It also contains methods for design of perturbation experiments, bootstrapping, analysis of linear dependence, sample selection, scaling of the SNR, and performance evaluation. GeneSPIDER offers control of network and data properties in simulations, together with tools to analyze these properties and draw conclusions on the quality of inferred GRNs. It can be fetched freely from the online =git= repository https://bitbucket.org/sonnhammergrni/genespider.

  • 48.
    Tjärnberg, Andreas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nordling, Torbjörn E. M.
    Studham, Matthew
    Nelander, Sven
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish eScience Research Center, Sweden.
    Avoiding pitfalls in L-1-regularised inference of gene networks2015In: Molecular Biosystems, ISSN 1742-206X, E-ISSN 1742-2051, Vol. 11, no 1, 287-296 p.Article in journal (Refereed)
    Abstract [en]

    Statistical regularisation methods such as LASSO and related L-1 regularised regression methods are commonly used to construct models of gene regulatory networks. Although they can theoretically infer the correct network structure, they have been shown in practice to make errors, i.e. leave out existing links and include non-existing links. We show that L-1 regularisation methods typically produce a poor network model when the analysed data are ill-conditioned, i.e. the gene expression data matrix has a high condition number, even if it contains enough information for correct network inference. However, the correct structure of network models can be obtained for informative data, data with such a signal to noise ratio that existing links can be proven to exist, when these methods fail, by using least-squares regression and setting small parameters to zero, or by using robust network inference, a recent method taking the intersection of all non-rejectable models. Since available experimental data sets are generally ill-conditioned, we recommend to check the condition number of the data matrix to avoid this pitfall of L-1 regularised inference, and to also consider alternative methods.

  • 49.
    Tjärnberg, Andreas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nordling, Torbjörn E. M.
    Studham, Matthew
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Optimal Sparsity Criteria for Network Inference2013In: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 20, no 5, 398-408 p.Article in journal (Refereed)
    Abstract [en]

    Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call zeta (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of zeta. In order to avoid such poor choices, we propose a method for optimization of zeta, which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave-one-out cross-optimization and selection of the zeta value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of zeta. We demonstrate that our zeta optimization method for two widely used inference algorithms-Glmnet and NIR-gives accurate and informative estimates of the network structure, given that the data is informative enough.

  • 50.
    Östlund, Gabriel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lindskog, Mats
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Network-based Identification of Novel Cancer Genes2010In: Molecular & Cellular Proteomics, ISSN 1535-9476, E-ISSN 1535-9484, Vol. 9, no 4, 648-655 p.Article in journal (Refereed)
    Abstract [en]

    Genes involved in cancer susceptibility and progression can serve as templates for searching protein networks for novel cancer genes. To this end, we introduce a general network searching method, MaxLink, and apply it to find and rank cancer gene candidates by their connectivity to known cancer genes. Using a comprehensive protein interaction network, we searched for genes connected to known cancer genes. First, we compiled a new set of 812 genes involved in cancer, more than twice the number in the Cancer Gene Census. Their network neighbors were then extracted. This candidate list was refined by selecting genes with unexpectedly high levels of connectivity to cancer genes and without previous association to cancer. This produced a list of 1891 new cancer candidates with up to 55 connections to known cancer genes. We validated our method by cross-validation, Gene Ontology term bias, and differential expression in cancer versus normal tissue. An example novel cancer gene candidate is presented with detailed analysis of the local network and neighbor annotation. Our study provides a ranked list of high priority targets for further studies in cancer research. Supplemental material is included.

12 1 - 50 of 53
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf