Change search
Refine search result
12 1 - 50 of 67
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dmitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Comparative interactomics with Funcoup 2.02012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no D1, p. D821-D828Article in journal (Refereed)
    Abstract [en]

    FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.

  • 2.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Global networks of functional coupling in eukaryotes from comprehensive data integration2009In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 19, no 6, p. 1107-16Article in journal (Refereed)
    Abstract [en]

    No single experimental method can discover all connections in the interactome. A computational approach can help by integrating data from multiple, often unrelated, proteomics and genomics pipelines. Reconstructing global networks of functional coupling (FC) faces the challenges of scale and heterogeneity--how to efficiently integrate huge amounts of diverse data from multiple organisms, yet ensuring high accuracy. We developed FunCoup, an optimized Bayesian framework, to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions: physical interaction, protein complex member, metabolic, or signaling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimize the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 data sets in seven organisms and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network, only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. We show how FunCoup can be used for discovering candidate members of the Parkinson and Alzheimer pathways. Cross-species pathway conservation analysis provided further support to these observations.

  • 3.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Wassenberg, Deena M.
    Lobenhofer, Edward K.
    Yen, Jerry
    Linney, Elwood
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Meyer, Joel N.
    Dynamic Zebrafish Interactome Reveals Transcriptional Mechanisms of Dioxin Toxicity2010In: PLOS ONE, ISSN 1932-6203, Vol. 5, no 5, p. e10465-Article in journal (Refereed)
    Abstract [en]

    Background: In order to generate hypotheses regarding the mechanisms by which 2,3,7,8-tetrachlorodibenzo-p-dioxin (dioxin) causes toxicity, we analyzed global gene expression changes in developing zebrafish embryos exposed to this potent toxicant in the context of a dynamic gene network. For this purpose, we also computationally inferred a zebrafish (Danio rerio) interactome based on orthologs and interaction data from other eukaryotes. Methodology/Principal Findings: Using novel computational tools to analyze this interactome, we distinguished between dioxin-dependent and dioxin-independent interactions between proteins, and tracked the temporal propagation of dioxin-dependent transcriptional changes from a few genes that were altered initially, to large groups of biologically coherent genes at later times. The most notable processes altered at later developmental stages were calcium and iron metabolism, embryonic morphogenesis including neuronal and retinal development, a variety of mitochondria-related functions, and generalized stress response (not including induction of antioxidant genes). Within the interactome, many of these responses were connected to cytochrome P4501A (cyp1a) as well as other genes that were dioxin-regulated one day after exposure. This suggests that cyp1a may play a key role initiating the toxic dysregulation of those processes, rather than serving simply as a passive marker of dioxin exposure, as suggested by earlier research. Conclusions/Significance: Thus, a powerful microarray experiment coupled with a flexible interactome and multi-pronged interactome tools (which are now made publicly available for microarray analysis and related work) suggest the hypothesis that dioxin, best known in fish as a potent cardioteratogen, has many other targets. Many of these types of toxicity have been observed in mammalian species and are potentially caused by alterations to cyp1a.

  • 4. Altenhoff, Adrian M.
    et al.
    Boeckmann, Brigitte
    Capella-Gutierrez, Salvador
    Dalquen, Daniel A.
    DeLuca, Todd
    Forslund, Kristoffer
    Huerta-Cepas, Jaime
    Linard, Benjamin
    Pereira, Cecile
    Pryszcz, Leszek P.
    Schreiber, Fabian
    da Silva, Alan Sousa
    Szklarczyk, Damian
    Train, Clement-Marie
    Bork, Peer
    Lecompte, Odile
    von Mering, Christian
    Xenarios, Ioannis
    Sjölander, Kimmen
    Juhl Jensen, Lars
    Martin, Maria J.
    Muffato, Matthieu
    Gabaldon, Toni
    Lewis, Suzanna E.
    Thomas, Paul D.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Dessimoz, Christophe
    Standardized benchmarking in the quest for orthologs2016In: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 13, no 5, p. 425-+Article in journal (Refereed)
    Abstract [en]

    Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.

  • 5. Barrientos-Somarribas, Mauricio
    et al.
    Messina, David N.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Pou, Christian
    Lysholm, Fredrik
    Bjerkner, Annelie
    Allander, Tobias
    Andersson, Bjorn
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Discovering viral genomes in human metagenomic data by predicting unknown protein families2018In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 8, article id 28Article in journal (Refereed)
    Abstract [en]

    Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.

  • 6. Berglund, Ann-Charlotte
    et al.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Östlund, Gabriel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    InParanoid 6: eukaryotic ortholog clusters with inparalogs2008In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 36, p. D263-D266Article in journal (Refereed)
    Abstract [en]

    The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.

  • 7. Berglund, Emelie
    et al.
    Maaskola, Jonas
    Schultz, Niklas
    Friedrich, Stefanie
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Marklund, Maja
    Bergenstråhle, Joseph
    Tarish, Firas
    Tanoglidi, Anna
    Vickovic, Sanja
    Larsson, Ludvig
    Salmén, Fredrik
    Ogris, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Wallenborg, Karolina
    Lagergren, Jens
    Ståhl, Patrik
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Helleday, Thomas
    Lundeberg, Joakim
    Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity2018In: Nature Communications, ISSN 2041-1723, E-ISSN 2041-1723, Vol. 9, article id 2419Article in journal (Refereed)
    Abstract [en]

    Intra-tumor heterogeneity is one of the biggest challenges in cancer treatment today. Here we investigate tissue-wide gene expression heterogeneity throughout a multifocal prostate cancer using the spatial transcriptomics (ST) technology. Utilizing a novel approach for deconvolution, we analyze the transcriptomes of nearly 6750 tissue regions and extract distinct expression profiles for the different tissue components, such as stroma, normal and PIN glands, immune cells and cancer. We distinguish healthy and diseased areas and thereby provide insight into gene expression changes during the progression of prostate cancer. Compared to pathologist annotations, we delineate the extent of cancer foci more accurately, interestingly without link to histological changes. We identify gene expression gradients in stroma adjacent to tumor regions that allow for re-stratification of the tumor microenvironment. The establishment of these profiles is the first step towards an unbiased view of prostate cancer and can serve as a dictionary for future studies.

  • 8.
    Björkholm, Patrik
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Comparative analysis and unification of domain-domain interaction networks2009In: Bioinformatics (Oxford, England), ISSN 1367-4811, Vol. 25, no 22, p. 3020-5Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Certain protein domains are known to preferentially interact with other domains. Several approaches have been proposed to predict domain-domain interactions, and over nine datasets are available. Our aim is to analyse the coverage and quality of the existing resources, as well as the extent of their overlap. With this knowledge, we have the opportunity to merge individual domain interaction networks to construct a comprehensive and reliable database. RESULTS: In this article we introduce a new approach towards comparing domain-domain interaction networks. This approach is used to compare nine predicted domain and protein interaction networks. The networks were used to generate a database of unified domain interactions, UniDomInt. Each interaction in the dataset is scored according to the benchmarked reliability of the sources. The performance of UniDomInt is an improvement compared to the underlying source networks and to another composite resource, Domine. AVAILABILITY: http://sonnhammer.sbc.su.se/download/UniDomInt/

  • 9. Carreras-Puigvert, Jordi
    et al.
    Zitnik, Marinka
    Jemth, Ann-Sofie
    Carter, Megan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Unterlass, Judith E.
    Hallström, Björn
    Loseva, Olga
    Karem, Zhir
    Calderón-Montaño, José Manuel
    Lindskog, Cecilia
    Edqvist, Per-Henrik
    Matuszewski, Damian J.
    Blal, Hammou Ait
    Berntsson, Ronnie P. A.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Häggblad, Maria
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Martens, Ulf
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Studham, Matthew
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Lundgren, Bo
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Wählby, Carolina
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Lundberg, Emma
    Stenmark, Pål
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Zupan, Blaz
    Helleday, Thomas
    A comprehensive structural, biochemical and biological profiling of the human NUDIX hydrolase family2017In: Nature Communications, ISSN 2041-1723, E-ISSN 2041-1723, Vol. 8, article id 1541Article in journal (Refereed)
    Abstract [en]

    The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.

  • 10. Dessimoz, Christophe
    et al.
    Gabaldón, Toni
    Roos, David S
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Herrero, Javier
    Toward community standards in the quest for orthologs.2012In: Bioinformatics (Oxford, England), ISSN 1367-4811, Vol. 28, no 6, p. 900-4Article in journal (Refereed)
    Abstract [en]

    The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.

  • 11. El-Gebali, Sara
    et al.
    Mistry, Jaina
    Bateman, Alex
    Eddy, Sean R.
    Luciani, Aurelien
    Potter, Simon C.
    Qureshi, Matloob
    Richardson, Lorna J.
    Salazar, Gustavo A.
    Smart, Alfredo
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Hirsh, Layla
    Paladin, Lisanna
    Piovesan, Damiano
    Tosatto, Silvio C. E.
    Finn, Robert D.
    The Pfam protein families database in 20192019In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 47, no D1, p. D427-D432Article in journal (Refereed)
    Abstract [en]

    The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families(EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

  • 12. Finn, Robert D.
    et al.
    Bateman, Alex
    Clements, Jody
    Coggill, Penelope
    Eberhardt, Ruth Y.
    Eddy, Sean R.
    Heger, Andreas
    Hetherington, Kirstie
    Holm, Liisa
    Mistry, Jaina
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tate, John
    Punta, Marco
    Pfam: the protein families database2014In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 42, no D1, p. d222-D230Article in journal (Refereed)
    Abstract [en]

    Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

  • 13. Finn, Robert D.
    et al.
    Mistry, Jaina
    Tate, John
    Coggill, Penny
    Heger, Andreas
    Pollington, Joanne E.
    Gavin, O. Luke
    Gunasekaran, Prasad
    Ceric, Goran
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Holm, Liisa
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Eddy, Sean R.
    Bateman, Alex
    The Pfam protein families database2010In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 38, p. d211-d222Article in journal (Refereed)
    Abstract [en]

    Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is similar to 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

  • 14.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Henricson, Anna
    Hollich, Volker
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain tree-based analysis of protein architecture evolution2008In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 25, no 2, p. 254-264Article in journal (Refereed)
    Abstract [en]

    Understanding the dynamics behind domain architecture evolution is of great importance to unravel the functions of proteins. Complex architectures have been created throughout evolution by rearrangement and duplication events. An interesting question is how many times a particular architecture has been created, a form of convergent evolution or domain architecture reinvention. Previous studies have approached this issue by comparing architectures found in different species. We wanted to achieve a finer-grained analysis by reconstructing protein architectures on complete domain trees. The prevalence of domain architecture reinvention in 96 genomes was investigated with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. Domain architectures were taken from Pfam. To ensure robustness, we applied the method to bootstrap trees and only considered results with strong statistical support. We detected multiple origins for 12.4% of the scored architectures. In a much smaller data set, the subset of completely domain-assigned proteins, the figure was 5.6%. These results indicate that domain architecture reinvention is a much more common phenomenon than previously thought. We also determined which domains are most frequent in multiply created architectures and assessed whether specific functions could be attributed to them. However, no strong functional bias was found in architectures with multiple origins.

  • 15.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Pekkari, Isabella
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain architecture conservation in orthologs2011In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, p. 326-Article in journal (Refereed)
    Abstract [en]

    Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.

    Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.

    Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

  • 16. Forslund, Kristoffer
    et al.
    Pereira, Cecile
    Capella-Gutierrez, Salvador
    Sousa da Silva, Alan
    Altenhoff, Adrian
    Huerta-Cepas, Jaime
    Muffato, Matthieu
    Patricio, Mateus
    Vandepoele, Klaas
    Ebersberger, Ingo
    Blake, Judith
    Fernandez Breis, Jesualdo Tomas
    Boeckmann, Brigitte
    Gabaldon, Toni
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Dessimoz, Christophe
    Lewis, Suzanna
    Gearing up to handle the mosaic nature of life in the quest for orthologs2018In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, no 2, p. 323-329Article in journal (Refereed)
    Abstract [en]

    The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

  • 17.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Schreiber, Fabian
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Thanintorn, Nattaphon
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    OrthoDisease: tracking disease gene orthologs across 100 species2011In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 12, no 5, p. 463-473Article in journal (Refereed)
    Abstract [en]

    Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases. We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version. Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups.

  • 18.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Swedish e-Science Research Center .
    Evolution of Protein Domain Architectures2012In: Evolutionary Genomics: Statistical and Computational Methods, Vol 2 / [ed] Anisimova, M, Totowa, NJ: Humana Press, 2012, p. 187-216Chapter in book (Refereed)
    Abstract [en]

    This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions arc reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).

  • 19.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Benchmarking homology detection procedures with low complexity filters2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 19, p. 2500-2505Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.

    RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.

    CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.

    AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

  • 20.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting protein function from domain content2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 15, p. 1681-1687Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.

    RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.

    AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar

  • 21.
    Friedrich, Stefanie
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Barbulescu, Remus
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Helleday, Thomas
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    MetaCNV - a consensus approach to infer accurate copy numbers from low coverage dataManuscript (preprint) (Other academic)
    Abstract [en]

    Background: The majority of copy number callers requires high read coverage data that is often achieved with elevated material input, which increases the heterogeneity of tissue samples. However, to gain insights into smaller areas within a tissue sample, e.g a cancerous area in a heterogeneous tissue sample, less material is used for sequencing, which results in lower read coverage. Therefore, more focus needs to be put on copy number calling that is sensitive enough for low coverage data. 

    Results: We present MetaCNV, a copy number caller that infers reliable copy numbers for human genomes with a consensus approach. MetaCNV specializes in low coverage data, but also performs well on normal and high coverage data. MetaCNV integrates the results of multiple copy number callers and infers absolute and unbiased copy numbers for the entire genome. MetaCNV is based on a meta-model that bypasses the weaknesses of current calling models while combining the strengths of existing approaches. Here we apply MetaCNV based on ReadDepth, SVDetect, and CNVnator to real and simulated datasets in order to demonstrate how the approach improves copy number calling. 

    Conclusions: MetaCNV, available at https://bitbucket.org/sonnhammergroup/metacnv, provides accurate copy number prediction on low coverage data and performs well on high coverage data.

  • 22.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    MGclus: network clustering employing shared neighbors2013In: Molecular BioSystems, ISSN 1742-206X, Vol. 9, no 7, p. 1670-1675Article in journal (Refereed)
    Abstract [en]

    Network analysis is an important tool for functional annotation of genes and proteins. A common approach to discern structure in a global network is to infer network clusters, or modules, and assume a functional coherence within each module, which may represent a complex or a pathway. It is however not trivial to define optimal modules. Although many methods have been proposed, it is unclear which methods perform best in general. It seems that most methods produce far from optimal results but in different ways. MGclus is a new algorithm designed to detect modules with a strongly interconnected neighborhood in large scale biological interaction networks. In our benchmarks we found MGclus to outperform other methods when applied to random graphs with varying degree of noise, and to perform equally or better when applied to biological protein interaction networks. MGclus is implemented in Java and utilizes the JGraphT graph library. It has an easy to use command-line interface and is available for download from http://sonnhammer.sbc.su.se/download/software/MGclus/.

  • 23.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Srinivas
    Bergh, Jonas
    Östman, Arne
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish Escience Research Center, Sweden.
    Prognostic Significance in Breast Cancer of a Gene Signature Capturing Stromal PDGF Signaling2013In: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191, Vol. 182, no 6, p. 2037-2047Article in journal (Refereed)
    Abstract [en]

    In this study, we describe a novel gene expression signature of platelet-derived growth factor (PDGF) activated fibroblasts, which is able to identify breast cancers with a PDGF-stimulated fibroblast stroma and displays an independent and strong prognostic significance. Global gene expression was compared between PDGF-stimulated human fibroblasts and cultured resting fibroblasts. The most differentially expressed genes were reduced to a gene expression signature of 113 genes. The biological significance and prognostic capacity of this signature were investigated using four independent clinical breast cancer data sets. Concomitant high expression of PDGF beta receptor and its cognate Ligands is associated with a high PDGF signature score. This supports the notion that the signature detects tumors with PDGF-activated stroma. Subsequent analyses indicated significant associations between high PDGF signature score and clinical characteristics, including human epidermal growth factor receptor 2 positivity, estrogen receptor negativity, high tumor grade, and large tumor size. A high PDGF signature score is associated with shorter survival in univariate analysis. Furthermore, the high PDGF signature score acts as a significant marker of poor prognosis in multivariate survival analyses, including classic prognostic markers, Ki-67 status, a proliferation gene signature, or other recently described stroma-derived gene expression signatures.

  • 24.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Sunny
    Bergh, Jonas
    Östman, Arne
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Prognostic significance in breast cancer of a gene signature capturing stromal PDGF signalingIn: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191Article in journal (Refereed)
  • 25.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Mank, Judith E.
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Network Analysis of Functional Genomics Data: Application to Avian Sex-Biased Gene Expression2012In: Scientific World Journal, ISSN 1537-744X, E-ISSN 1537-744X, p. 130491-Article in journal (Refereed)
    Abstract [en]

    Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.

  • 26. Gabaldón, Toni
    et al.
    Dessimoz, Christophe
    Huxley-Jones, Julie
    Vilella, Albert J
    Sonnhammer, Erik Ll
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lewis, Suzanna
    Joining forces in the quest for orthologs2009In: Genome biology, ISSN 1465-6914, Vol. 10, no 9, p. 403-Article in journal (Refereed)
    Abstract [en]

    Better orthology-prediction resources would be beneficial for the whole biological community. A recent meeting discussed how to coordinate and leverage current efforts.

  • 27. Glover, Natasha
    et al.
    Dessimoz, Christophe
    Ebersberger, Ingo
    Forslund, Sofia K.
    Gabaldón, Toni
    Huerta-Cepas, Jaime
    Martin, Maria-Jesus
    Muffato, Matthieu
    Patricio, Mateus
    Pereira, Cécile
    da Silva, Alan Sousa
    Wang, Yan
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Thomas, Paul D.
    Advances and Applications in the Quest for Orthologs2019In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 36, no 10, p. 2157-2164Article, review/survey (Refereed)
    Abstract [en]

    Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.

  • 28.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bernhem, Kristoffer
    Ait Blal, Hammou
    Lundberg, Emma
    Brismar, Hjalmar
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Experimental validation of predicted cancer genes using FRETManuscript (preprint) (Other academic)
  • 29.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bernhem, Kristoffer
    Blal, Hammou Ait
    Jans, Daniel
    Lundberg, Emma
    Brismar, Hjalmar
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Experimental validation of predicted cancer genes using FRET2018In: Methods and applications in fluorescence, ISSN 2050-6120, Vol. 6, no 3, article id 035007Article in journal (Refereed)
    Abstract [en]

    Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.

  • 30.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden; Swedish eScience Research Center, Sweden.
    MaxLink: network-based prioritization of genes tightly linked to a disease seed set2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, p. 2689-2690Article in journal (Refereed)
    Abstract [en]

    A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.

  • 31.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A large-scale benchmark of gene prioritization methods2017In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 7, article id 46598Article in journal (Refereed)
    Abstract [en]

    In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology (GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

  • 32.
    Haider, Christian
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Kavic, Marina
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    TreeDom: a graphical web tool for analysing domain architecture evolution2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 15, p. 2384-2385Article in journal (Refereed)
    Abstract [en]

    We present TreeDom, a web tool for graphically analysing the evolutionary history of domains in multi-domain proteins. Individual domains on the same protein chain may have distinct evolutionary histories, which is important to grasp in order to understand protein function. For instance, it may be important to know whether a domain was duplicated recently or long ago, to know the origin of inserted domains, or to know the pattern of domain loss within a protein family. TreeDom uses the Pfam database as the source of domain annotations, and displays these on a sequence tree. An advantage of TreeDom is that the user can limit the analysis to N sequences that are most similar to a query, or provide a list of sequence IDs to include. Using the Pfam alignment of the selected sequences, a tree is built and displayed together with the domain architecture of each sequence.

  • 33.
    Henricson, Anna
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Orthology confers intron position conservation2010In: BMC Genomics, ISSN 1471-2164, E-ISSN 1471-2164, Vol. 11:412Article in journal (Refereed)
    Abstract [en]

    Background: With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence.

    Results: To this end, we developed a new score for IPC and applied it to ortholog groups between human and six other species. For comparison, we also gathered the closest non-orthologs, meaning sequences close in sequence space, yet falling just outside the ortholog cluster. We found that ortholog-ortholog gene pairs on average have a significantly higher degree of IPC compared to ortholog-closest non-ortholog pairs. Also pairs of inparalogs were found to have a higher IPC score than inparalog-closest non-inparalog pairs. We verified that these differences can not simply be attributed to the generally higher sequence identity of the ortholog-ortholog and the inparalog-inparalog pairs. Furthermore, we analyzed the agreement between IPC score and the ortholog score assigned by the InParanoid algorithm, and found that it was consistently high for all species comparisons. In a minority of cases, the IPC and InParanoid score ranked inparalogs differently. These represent cases where sequence and intron position divergence are discordant. We further analyzed the discordant clusters to identify any possible preference for protein functions by looking for enriched GO terms and Pfam protein domains. They were enriched for functions important for multicellularity, which implies a connection between shifts in intronic structure and the origin of multicellularity.

    Conclusions: We conclude that orthologous genes tend to have more conserved intron positions compared to non-orthologous genes. As a consequence, our IPC score is useful as an additional discriminating factor when assigning orthology.

  • 34. Hollich, Volker
    et al.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    PfamAlyzer: domain-centric homology search2007In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 23, no 24, p. 3382-3383Article in journal (Refereed)
    Abstract [en]

    PfamAlyzer is a Java applet that enables exploration of Pfam domain architectures using a user-friendly graphical interface. It can search the UniProt protein database for a domain pattern. Domain patterns similar to the query are presented graphically by PfamAlyzer either in a ranked list or pinned to the tree of life. Such domain-centric homology search can assist identification of distant homologs with shared domain architecture.

  • 35.
    Kaduk, Mateusz
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Riegler, Christian
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). FH OÖ - University of Applied Sciences Upper Austria, Austria.
    Lemp, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). FH OÖ - University of Applied Sciences Upper Austria, Austria.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    HieranoiDB: a database of orthologs inferred by Hieranoid2017In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 45, no D1, p. D687-D690Article in journal (Refereed)
    Abstract [en]

    HieranoiDB (http://hieranoiDB.sbc.su.se) is a freely available on-line database for hierarchical groups of orthologs inferred by the Hieranoid algorithm. It infers orthologs at each node in a species guide tree with the InParanoid algorithm as it progresses from the leaves to the root. Here we present a database HieranoiDB with a web interface that makes it easy to search and visualize the output of Hieranoid, and to download it in various formats. Searching can be performed using protein description, identifier or sequence. In this first version, orthologs are available for the 66 Quest for Orthologs reference proteomes. The ortholog trees are shown graphically and interactively with marked speciation and duplication nodes that show the inferred evolutionary scenario, and allow for correct extraction of predicted orthologs from the Hieranoid trees.

  • 36.
    Kaduk, Mateusz
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Improved orthology inference with Hieranoid 22017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 8, p. 1154-1159Article in journal (Refereed)
    Abstract [en]

    Motivation: The initial step in many orthology inference methods is the computationally demanding establishment of all pairwise protein similarities across all analysed proteomes. The quadratic scaling with proteomes has become a major bottleneck. A remedy is offered by the Hieranoid algorithm which reduces the complexity to linear by hierarchically aggregating ortholog groups from InParanoid along a species tree. Results: We have further developed the Hieranoid algorithm in many ways. Major improvements have been made to the construction of multiple sequence alignments and consensus sequences. Hieranoid version 2 was evaluated with standard benchmarks that reveal a dramatic increase in the coverage/accuracy tradeoff over version 1, such that it now compares favourably with the best methods. The new parallelized cluster mode allows Hieranoid to be run on large data sets in a much shorter timespan than InParanoid, yet at similar accuracy.

  • 37.
    Klammer, Martin
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Messina, David N.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    MetaTM - a consensus method for transmembrane protein topology prediction2009In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 10, p. 314-Article in journal (Refereed)
    Abstract [en]

    Transmembrane (TM) proteins are proteins that span a biological membrane one or more times. As their 3-D structures are hard to determine, experiments focus on identifying their topology (i. e. which parts of the amino acid sequence are buried in the membrane and which are located on either side of the membrane), but only a few topologies are known. Consequently, various computational TM topology predictors have been developed, but their accuracies are far from perfect. The prediction quality can be improved by applying a consensus approach, which combines results of several predictors to yield a more reliable result. RESULTS: A novel TM consensus method, named MetaTM, is proposed in this work. MetaTM is based on support vector machine models and combines the results of six TM topology predictors and two signal peptide predictors. On a large data set comprising 1460 sequences of TM proteins with known topologies and 2362 globular protein sequences it correctly predicts 86.7% of all topologies. CONCLUSION: Combining several TM predictors in a consensus prediction framework improves overall accuracy compared to any of the individual methods. Our proposed SVM-based system also has higher accuracy than a previous consensus predictor. MetaTM is made available both as downloadable source code and as DAS server at http://MetaTM.sbc.su.se.

  • 38.
    Kutsenko, Alexey
    et al.
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute. Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Svensson, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nystedt, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Uppsala University, Sweden.
    Lundeberg, Joakim
    Björk, Petra
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Giacomello, Stefania
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Visa, Neus
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    Wieslander, Lars
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    The Chironomus tentans genome sequence and the organization of the Balbiani ring genes2014In: BMC Genomics, ISSN 1471-2164, E-ISSN 1471-2164, Vol. 15, p. 819-Article in journal (Refereed)
    Abstract [en]

    Background: The polytene nuclei of the dipteran Chironomus tentans (Ch. tentans) with their Balbiani ring (BR) genes constitute an exceptional model system for studies of the expression of endogenous eukaryotic genes. Here, we report the first draft genome of Ch. tentans and characterize its gene expression machineries and genomic architecture of the BR genes. Results: The genome of Ch. tentans is approximately 200 Mb in size, and has a low GC content (31%) and a low repeat fraction (15%) compared to other Dipteran species. Phylogenetic inference revealed that Ch. tentans is a sister clade to mosquitoes, with a split 150-250 million years ago. To characterize the Ch. tentans gene expression machineries, we identified potential orthologus sequences to more than 600 Drosophila melanogaster (D. melanogaster) proteins involved in the expression of protein-coding genes. We report novel data on the organization of the BR gene loci, including a novel putative BR gene, and we present a model for the organization of chromatin bundles in the BR2 puff based on genic and intergenic in situ hybridizations. Conclusions: We show that the molecular machineries operating in gene expression are largely conserved between Ch. tentans and D. melanogaster, and we provide enhanced insight into the organization and expression of the BR genes. Our data strengthen the generality of the BR genes as a unique model system and provide essential background for in-depth studies of the biogenesis of messenger ribonucleoprotein complexes.

  • 39. Lassmann, Timo
    et al.
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L L
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features.2009In: Nucleic acids research, ISSN 1362-4962, Vol. 37, no 3, p. 858-65Article in journal (Refereed)
    Abstract [en]

    In the growing field of genomics, multiple alignment programs are confronted with ever increasing amounts of data. To address this growing issue we have dramatically improved the running time and memory requirement of Kalign, while maintaining its high alignment accuracy. Kalign version 2 also supports nucleotide alignment, and a newly introduced extension allows for external sequence annotation to be included into the alignment procedure. We demonstrate that Kalign2 is exceptionally fast and memory-efficient, permitting accurate alignment of very large numbers of sequences. The accuracy of Kalign2 compares well to the best methods in the case of protein alignments while its accuracy on nucleotide alignments is generally superior. In addition, we demonstrate the potential of using known or predicted sequence annotation to improve the alignment accuracy. Kalign2 is freely available for download from the Kalign web site (http://msa.sbc.su.se/).

  • 40. Lindberg, Julia
    et al.
    Alexeyenko, Andrey
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Perez-Bercoff, Åsa
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Overview and comparison of ortholog databases2006In: Drug Discovery Today: Technologies, ISSN 1740-6749, Vol. 3, no 2Article in journal (Refereed)
    Abstract [sv]

    Orthologs are an indispensable bridge to transfer biological knowledge between species, from protein annotations to sophisticated disease models. However, orthology assignment is not trivial. A large number of resources now exist, each with its own idiosyncrasies. The goal of this review is to compare their contents and clarify which database is most suited for a certain task.

  • 41. Marklund, Maja
    et al.
    Schultz, Niklas
    Friedrich, Stefanie
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Berglund, Emelie
    Tarish, Firas
    Maaskola, Jonas
    Bergenstråhle, Jonas
    Liu, Yao
    Tanoglidi, Anna
    Ståhl, Patrik
    Helleday, Thomas
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Lundeberg, Joakim
    Spatio-temporal analysis of prostate tumours suggests the pre-existence of ADT-resistant expression clonesManuscript (preprint) (Other academic)
  • 42.
    McCormack, Theodore
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Statistical Assessment of Crosstalk Enrichment between Gene Groups in Biological Networks2013In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, no 1, p. e54945-Article in journal (Refereed)
    Abstract [en]

    Motivation: Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results: Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation: CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.

  • 43.
    Messina, David
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lysholm, Fredrik
    Department of Cell and Molecular Biology, Karolinska Institutet.
    Allander, Tobias
    Department of Microbiology, Tumor- and Cell Biology, Karolinska Institutet.
    Andersson, Björn
    Department of Cell and Molecular Biology, Karolinska Institutet.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Discovery of novel protein families in metagenomic samplesManuscript (preprint) (Other academic)
    Abstract [en]

    Despite the steady rise in gene sequence information, there is a persistent, significant fraction of genes which do not match any previously known sequence. These genes are called ORFans, and metagenomic samples, where DNA is extracted from a mixed population of unknown and often uncultivable species, are a rich source of ORFans. Viral infections cause significant morbidity and mortality, and identifying ORFan viral gene families from human metagenomic samples represents a route to understanding molecular processes that affect human health. Few methods exist for metagenomic gene-finding, and most of them rely on sequence similarity, which cannot be used to detect ORFans. Furthermore, nonsimilarity-based methods are hard to apply to the complex mixture of short, higherror-rate sequence fragments which are typical of metagenomic projects. Here we present an approach to detect ORFan protein families in short-read data, and apply it to 937 Mbp (megabase pairs) of sequence from 17 virus-enriched libraries made from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. After isolating approximately 450 putative ORFan families from clusters of sequence contigs, we applied RNAcode, a gene finder developed for use on high-quality genome sequences, and calibrated it for errorprone short sequence reads. Additional predictive measures such as sequence complexity and length were then used to rank and filter candidates into a high-quality set of 32 putative novel gene families, only two of which show significant similarity to known genes.

  • 44.
    Messina, David N.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    DASher: a stand-alone protein sequence client for DAS, the Distributed Annotation System2009In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, no 10, p. 1333-1334Article in journal (Refereed)
    Abstract [en]

    The rise in biological sequence data has led to a proliferation of separate, specialized databases. While there is great value in having many independent annotations, it is critical that there be a way to integrate them in one combined view. The Distributed Annotation System (DAS) was developed for that very purpose. There are currently no DAS clients that are open source, specialized for aggregating and comparing protein sequence annotation, and that can run as a self-contained application outside of a web browser. The speed, flexibility and extensibility that come with a stand-alone application motivated us to create DASher, an open-source Java DAS client. Given a UniProt sequence identifier, DASher automatically queries DAS-supporting servers worldwide for any information on that sequence and then displays the annotations in an interactive viewer for easy comparison. DASher is a fast, Java-based DAS client optimized for viewing protein sequence annotation and compliant with the latest DAS protocol specification 1.53E. AVAILABILITY: DASher is available for direct use and download at http://dasher.sbc.su.se including examples and source code under the GPLv3 licence. Java version 6 or higher is required.

  • 45.
    Morgan, Daniel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Studham, Matthew
    Tjärnberg, Andreas
    Weishaupt, Holger
    Swartling, Fredrik
    Nordling, Torbjörn
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Perturbation-based gene regulatory network inference to unravel oncogenic mechanismsManuscript (preprint) (Other academic)
    Abstract [en]

    Motivation: Cancer is known to stem from multiple, independent mutations, the effects of which aggregate to drive the cell into a cancerous state. To understand the complex interplay between affected genes, their gene regulatory network (GRN) needs to be uncovered, to revealing detailed insights of regulatory mechanisms. We therefore decided to infer a reliable GRN from perturbation responses of 40 genes known or suspected to have a role in human cancers yet whose regulatory interactions are poorly known.

    Results: siRNA knockdown experiments of each gene were done in a human squamous carcinoma cell line, after which the transcriptomic response was measured. From these data GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. The best GRN was shown to be significantly more predictive than the null model, both in crossvalidated benchmarks and for an independent dataset of the same genes but subjected to double perturbations. It agrees with many known links in addition to predicting a large number of novel interactions, a subset of which were experimentally validated. The inferred GRN captures regulatory interactions central to cancer-relevant processes and thus provides mechanistic insights that are useful for future cancer research.

  • 46.
    Morgan, Daniel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Nordling, Torbjörn E. M.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A generalized framework for controlling FDR in gene regulatory network inference2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 6, p. 1026-1032Article in journal (Refereed)
    Abstract [en]

    Motivation: Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied.

    Results: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.

  • 47.
    Ogris, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Helleday, Thomas
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A novel method for crosstalk analysis of biological networks: improving accuracy of pathway annotation2017In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 45, no 2, article id e8Article in journal (Refereed)
    Abstract [en]

    Analyzing gene expression patterns is a mainstay to gain functional insights of biological systems. A plethora of tools exist to identify significant enrichment of pathways for a set of differentially expressed genes. Most tools analyze gene overlap between gene sets and are therefore severely hampered by the current state of pathway annotation, yet at the same time they run a high risk of false assignments. A way to improve both true positive and false positive rates (FPRs) is to use a functional association network and instead look for enrichment of network connections between gene sets. We present a new network crosstalk analysis method BinoX that determines the statistical significance of network link enrichment or depletion between gene sets, using the binomial distribution. This is a much more appropriate statistical model than previous methods have employed, and as a result BinoX yields substantially better true positive and FPRs than was possible before. A number of benchmarks were performed to assess the accuracy of BinoX and competing methods. We demonstrate examples of how BinoX finds many biologically meaningful pathway annotations for gene sets from cancer and other diseases, which are not found by other methods.

  • 48.
    Ogris, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kaduk, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    FunCoup 4: new species, data, and visualization2018In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 46, no D1, p. D601-D607Article in journal (Refereed)
    Abstract [en]

    This release of the FunCoup database ( http://funcoup.sbc.su.se) is the fourth generation of one of the most comprehensive databases for genome-wide functional association networks. These functional associations are inferred via integrating various data types using a naive Bayesian algorithm and orthology based information transfer across different species. This approach provides high coverage of the included genomes as well as high quality of inferred interactions. In this update of FunCoup we introduce four new eukaryotic species: Schizosaccharomyces pombe, Plasmodium falciparum, Bos taurus, Oryza sativa and open the database to the prokaryotic domain by including networks for Escherichia coli and Bacillus subtilis. The latter allows us to also introduce a new class of functional association between genes - co-occurrence in the same operon. We also supplemented the existing classes of functional association: metabolic, signaling, complex and physical protein interaction with up-to-date information. In this release we switched to InParanoid v8 as the source of orthology and base for calculation of phylogenetic profiles. While populating all other evidence types with new data we introduce a new evidence type based on quantitative mass spectrometry data. Finally, the newJavaScript based network viewer provides the user an intuitive and responsive platform to further evaluate the results.

  • 49.
    Ogris, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Helleday, Thomas
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    PathwAX: a web server for network crosstalk based pathway annotation2016In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 44, no W1, p. W105-W109Article in journal (Refereed)
    Abstract [en]

    Pathway annotation of gene lists is often used to functionally analyse biomolecular data such as gene expression in order to establish which processes are activated in a given experiment. Databases such as KEGG or GO represent collections of how genes are known to be organized in pathways, and the challenge is to compare a given gene list with the known pathways such that all true relations are identified. Most tools apply statistical measures to the gene overlap between the gene list and pathway. It is however problematic to avoid false negatives and false positives when only using the gene overlap. The pathwAX web server (http://pathwAX.sbc.su.se/) applies a different approach which is based on network crosstalk. It uses the comprehensive network FunCoup to analyse network crosstalk between a query gene list and KEGG pathways. PathwAX runs the BinoX algorithm, which employs Monte-Carlo sampling of randomized networks and estimates a binomial distribution, for estimating the statistical significance of the crosstalk. This results in substantially higher accuracy than gene overlap methods. The system was optimized for speed and allows interactive web usage. We illustrate the usage and output of pathwAX.

  • 50.
    Persson, Emma
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kaduk, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Forslund, Sofia K.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Domainoid: domain-oriented orthology inference2019In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, no 1, article id 523Article in journal (Refereed)
    Abstract [en]

    Background: Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains.

    Results: This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark.

    Conclusions: Our results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches.

12 1 - 50 of 67
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf