Ändra sökning
Avgränsa sökresultatet
12 1 - 50 av 94
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1. Acevedo, Nathalie
    et al.
    Scala, Giovanni
    Kebede Merid, Simon
    Frumento, Paolo
    Bruhn, Sören
    Andersson, Anna
    Ogris, Christoph
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). Helmholtz Center Munich, Germany.
    Bottai, Matteo
    Pershagen, Göran
    Koppelman, Gerard H.
    Melén, Erik
    Sonnhammer, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Alm, Johan
    Söderhäll, Cilla
    Kere, Juha
    Greco, Dario
    Scheynius, Annika
    DNA Methylation Levels in Mononuclear Leukocytes from the Mother and Her Child Are Associated with IgE Sensitization to Allergens in Early Life2021Ingår i: International Journal of Molecular Sciences, ISSN 1661-6596, E-ISSN 1422-0067, Vol. 22, nr 2, artikel-id 801Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    DNA methylation changes may predispose becoming IgE-sensitized to allergens. We analyzed whether DNA methylation in peripheral blood mononuclear cells (PBMC) is associated with IgE sensitization at 5 years of age (5Y). DNA methylation was measured in 288 PBMC samples from 74 mother/child pairs from the birth cohort ALADDIN (Assessment of Lifestyle and Allergic Disease During INfancy) using the HumanMethylation450BeadChip (Illumina). PBMCs were obtained from the mothers during pregnancy and from their children in cord blood, at 2 years and 5Y. DNA methylation levels at each time point were compared between children with and without IgE sensitization to allergens at 5Y. For replication, CpG sites associated with IgE sensitization in ALADDIN were evaluated in whole blood DNA of 256 children, 4 years old, from the BAMSE (Swedish abbreviation for Children, Allergy, Milieu, Stockholm, Epidemiology) cohort. We found 34 differentially methylated regions (DMRs) associated with IgE sensitization to airborne allergens and 38 DMRs associated with sensitization to food allergens in children at 5Y (Sidak p <= 0.05). Genes associated with airborne sensitization were enriched in the pathway of endocytosis, while genes associated with food sensitization were enriched in focal adhesion, the bacterial invasion of epithelial cells, and leukocyte migration. Furthermore, 25 DMRs in maternal PBMCs were associated with IgE sensitization to airborne allergens in their children at 5Y, which were functionally annotated to the mTOR (mammalian Target of Rapamycin) signaling pathway. This study supports that DNA methylation is associated with IgE sensitization early in life and revealed new candidate genes for atopy. Moreover, our study provides evidence that maternal DNA methylation levels are associated with IgE sensitization in the child supporting early in utero effects on atopy predisposition.

  • 2.
    Alexeyenko, Andrey
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Schmitt, Thomas
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Guala, Dmitri
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Comparative interactomics with Funcoup 2.02012Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, nr D1, s. D821-D828Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.

    Ladda ner fulltext (pdf)
    fulltext
  • 3.
    Alexeyenko, Andrey
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L L
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Global networks of functional coupling in eukaryotes from comprehensive data integration2009Ingår i: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 19, nr 6, s. 1107-16Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    No single experimental method can discover all connections in the interactome. A computational approach can help by integrating data from multiple, often unrelated, proteomics and genomics pipelines. Reconstructing global networks of functional coupling (FC) faces the challenges of scale and heterogeneity--how to efficiently integrate huge amounts of diverse data from multiple organisms, yet ensuring high accuracy. We developed FunCoup, an optimized Bayesian framework, to resolve these issues. Because interactomes comprise functional coupling of many types, FunCoup annotates network edges with confidence scores in support of different kinds of interactions: physical interaction, protein complex member, metabolic, or signaling link. This capability boosted overall accuracy. On the whole, the constructed framework was comprehensively tested to optimize the overall confidence and ensure seamless, automated incorporation of new data sets of heterogeneous types. Using over 50 data sets in seven organisms and extensively transferring information between orthologs, FunCoup predicted global networks in eight eukaryotes. For the Ciona intestinalis network, only orthologous information was used, and it recovered a significant number of experimental facts. FunCoup predictions were validated on independent cancer mutation data. We show how FunCoup can be used for discovering candidate members of the Parkinson and Alzheimer pathways. Cross-species pathway conservation analysis provided further support to these observations.

  • 4.
    Alexeyenko, Andrey
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Wassenberg, Deena M.
    Lobenhofer, Edward K.
    Yen, Jerry
    Linney, Elwood
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Meyer, Joel N.
    Dynamic Zebrafish Interactome Reveals Transcriptional Mechanisms of Dioxin Toxicity2010Ingår i: PLOS ONE, ISSN 1932-6203, Vol. 5, nr 5, s. e10465-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: In order to generate hypotheses regarding the mechanisms by which 2,3,7,8-tetrachlorodibenzo-p-dioxin (dioxin) causes toxicity, we analyzed global gene expression changes in developing zebrafish embryos exposed to this potent toxicant in the context of a dynamic gene network. For this purpose, we also computationally inferred a zebrafish (Danio rerio) interactome based on orthologs and interaction data from other eukaryotes. Methodology/Principal Findings: Using novel computational tools to analyze this interactome, we distinguished between dioxin-dependent and dioxin-independent interactions between proteins, and tracked the temporal propagation of dioxin-dependent transcriptional changes from a few genes that were altered initially, to large groups of biologically coherent genes at later times. The most notable processes altered at later developmental stages were calcium and iron metabolism, embryonic morphogenesis including neuronal and retinal development, a variety of mitochondria-related functions, and generalized stress response (not including induction of antioxidant genes). Within the interactome, many of these responses were connected to cytochrome P4501A (cyp1a) as well as other genes that were dioxin-regulated one day after exposure. This suggests that cyp1a may play a key role initiating the toxic dysregulation of those processes, rather than serving simply as a passive marker of dioxin exposure, as suggested by earlier research. Conclusions/Significance: Thus, a powerful microarray experiment coupled with a flexible interactome and multi-pronged interactome tools (which are now made publicly available for microarray analysis and related work) suggest the hypothesis that dioxin, best known in fish as a potent cardioteratogen, has many other targets. Many of these types of toxicity have been observed in mammalian species and are potentially caused by alterations to cyp1a.

  • 5. Altenhoff, Adrian M.
    et al.
    Boeckmann, Brigitte
    Capella-Gutierrez, Salvador
    Dalquen, Daniel A.
    DeLuca, Todd
    Forslund, Kristoffer
    Huerta-Cepas, Jaime
    Linard, Benjamin
    Pereira, Cecile
    Pryszcz, Leszek P.
    Schreiber, Fabian
    da Silva, Alan Sousa
    Szklarczyk, Damian
    Train, Clement-Marie
    Bork, Peer
    Lecompte, Odile
    von Mering, Christian
    Xenarios, Ioannis
    Sjölander, Kimmen
    Juhl Jensen, Lars
    Martin, Maria J.
    Muffato, Matthieu
    Gabaldon, Toni
    Lewis, Suzanna E.
    Thomas, Paul D.
    Sonnhammer, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Dessimoz, Christophe
    Standardized benchmarking in the quest for orthologs2016Ingår i: Nature Methods, ISSN 1548-7091, E-ISSN 1548-7105, Vol. 13, nr 5, s. 425-+Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Achieving high accuracy in orthology inference is essential for many comparative, evolutionary and functional genomic analyses, yet the true evolutionary history of genes is generally unknown and orthologs are used for very different applications across phyla, requiring different precision-recall trade-offs. As a result, it is difficult to assess the performance of orthology inference methods. Here, we present a community effort to establish standards and an automated web-based service to facilitate orthology benchmarking. Using this service, we characterize 15 well-established inference methods and resources on a battery of 20 different benchmarks. Standardized benchmarking provides a way for users to identify the most effective methods for the problem at hand, sets a minimum requirement for new tools and resources, and guides the development of more accurate orthology inference methods.

  • 6. Altenhoff, Adrian M.
    et al.
    Garrayo-Ventas, Javier
    Cosentino, Salvatore
    Emms, David
    Glover, Natasha M.
    Hernández-Plaza, Ana
    Nevers, Yannis
    Sundesha, Vicky
    Szklarczyk, Damian
    Fernández, José M.
    Codó, Laia
    Li Gelpi, Josep
    Huerta-Cepas, Jaime
    Iwasaki, Wataru
    Kelly, Steven
    Lecompte, Odile
    Muffato, Matthieu
    Martin, Maria J.
    Capella-Gutierrez, Salvador
    Thomas, Paul D.
    Sonnhammer, Erik
    Stockholms universitet, Science for Life Laboratory (SciLifeLab). Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Dessimoz, Christophe
    The Quest for Orthologs benchmark service and consensus calls in 20202020Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 48, nr W1, s. W538-W545Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The identification of orthologs-genes in different species which descended from the same gene in their last common ancestor-is a prerequisite for many analyses in comparative genomics and molecular evolution. Numerous algorithms and resources have been conceived to address this problem, but benchmarking and interpreting them is fraught with difficulties (need to compare them on a common input dataset, absence of ground truth, computational cost of calling orthologs). To address this, the Quest for Orthologs consortium maintains a reference set of proteomes and provides a web server for continuous orthology benchmarking (http://orthology.benchmarkservice.org). Furthermore, consensus ortholog calls derived from public benchmark submissions are provided on the Alliance of Genome Resources website, the joint portal of NIH-funded model organism databases.

  • 7. Barrientos-Somarribas, Mauricio
    et al.
    Messina, David N.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Pou, Christian
    Lysholm, Fredrik
    Bjerkner, Annelie
    Allander, Tobias
    Andersson, Bjorn
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Discovering viral genomes in human metagenomic data by predicting unknown protein families2018Ingår i: Scientific Reports, E-ISSN 2045-2322, Vol. 8, artikel-id 28Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Massive amounts of metagenomics data are currently being produced, and in all such projects a sizeable fraction of the resulting data shows no or little homology to known sequences. It is likely that this fraction contains novel viruses, but identification is challenging since they frequently lack homology to known viruses. To overcome this problem, we developed a strategy to detect ORFan protein families in shotgun metagenomics data, using similarity-based clustering and a set of filters to extract bona fide protein families. We applied this method to 17 virus-enriched libraries originating from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. This resulted in 32 predicted putative novel gene families. Some families showed detectable homology to sequences in metagenomics datasets and protein databases after reannotation. Notably, one predicted family matches an ORF from the highly variable Torque Teno virus (TTV). Furthermore, follow-up from a predicted ORFan resulted in the complete reconstruction of a novel circular genome. Its organisation suggests that it most likely corresponds to a novel bacteriophage in the microviridae family, hence it was named bacteriophage HFM.

  • 8. Berglund, Ann-Charlotte
    et al.
    Sjölund, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Östlund, Gabriel
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    InParanoid 6: eukaryotic ortholog clusters with inparalogs2008Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 36, s. D263-D266Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.

  • 9. Berglund, Emelie
    et al.
    Maaskola, Jonas
    Schultz, Niklas
    Friedrich, Stefanie
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Marklund, Maja
    Bergenstråhle, Joseph
    Tarish, Firas
    Tanoglidi, Anna
    Vickovic, Sanja
    Larsson, Ludvig
    Salmén, Fredrik
    Ogris, Christoph
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Wallenborg, Karolina
    Lagergren, Jens
    Ståhl, Patrik
    Sonnhammer, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Helleday, Thomas
    Lundeberg, Joakim
    Spatial maps of prostate cancer transcriptomes reveal an unexplored landscape of heterogeneity2018Ingår i: Nature Communications, E-ISSN 2041-1723, Vol. 9, artikel-id 2419Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Intra-tumor heterogeneity is one of the biggest challenges in cancer treatment today. Here we investigate tissue-wide gene expression heterogeneity throughout a multifocal prostate cancer using the spatial transcriptomics (ST) technology. Utilizing a novel approach for deconvolution, we analyze the transcriptomes of nearly 6750 tissue regions and extract distinct expression profiles for the different tissue components, such as stroma, normal and PIN glands, immune cells and cancer. We distinguish healthy and diseased areas and thereby provide insight into gene expression changes during the progression of prostate cancer. Compared to pathologist annotations, we delineate the extent of cancer foci more accurately, interestingly without link to histological changes. We identify gene expression gradients in stroma adjacent to tumor regions that allow for re-stratification of the tumor microenvironment. The establishment of these profiles is the first step towards an unbiased view of prostate cancer and can serve as a dictionary for future studies.

  • 10.
    Björkholm, Patrik
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L L
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Comparative analysis and unification of domain-domain interaction networks2009Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, nr 22, s. 3020-5Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    MOTIVATION: Certain protein domains are known to preferentially interact with other domains. Several approaches have been proposed to predict domain-domain interactions, and over nine datasets are available. Our aim is to analyse the coverage and quality of the existing resources, as well as the extent of their overlap. With this knowledge, we have the opportunity to merge individual domain interaction networks to construct a comprehensive and reliable database. RESULTS: In this article we introduce a new approach towards comparing domain-domain interaction networks. This approach is used to compare nine predicted domain and protein interaction networks. The networks were used to generate a database of unified domain interactions, UniDomInt. Each interaction in the dataset is scored according to the benchmarked reliability of the sources. The performance of UniDomInt is an improvement compared to the underlying source networks and to another composite resource, Domine. AVAILABILITY: http://sonnhammer.sbc.su.se/download/UniDomInt/

  • 11.
    Buzzao, Davide
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Castresana-Aguirre, Miguel
    Guala, Dimitri
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    TOPAS, a network-based approach to detect disease modules in a top-down fashion 2022Ingår i: NAR Genomics and Bioinformatics, E-ISSN 2631-9268, Vol. 4, nr 4, artikel-id lqac093Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A vast scenario of potential disease mechanisms and remedies is yet to be discovered. The field of Network Medicine has grown thanks to the massive amount of high-throughput data and the emerging evidence that disease-related proteins form ‘disease modules’. Relying on prior disease knowledge, network-based disease module detection algorithms aim at connecting the list of known disease associated genes by exploiting interaction networks. Most existing methods extend disease modules by iteratively adding connector genes in a bottom-up fashion, while top-down approaches remain largely unexplored. We have created TOPAS, an iterative approach that aims at connecting the largest number of seed nodes in a top-down fashion through connectors that guarantee the highest flow of a Random Walk with Restart in a network of functional associations. We used a corpus of 382 manually selected functional gene sets to benchmark our algorithm against SCA, DIAMOnD, MaxLink and ROBUST across four interactomes. We demonstrate that TOPAS outperforms competing methods in terms of Seed Recovery Rate, Seed to Connector Ratio and consistency during module detection. We also show that TOPAS achieves competitive performance in terms of biological relevance of detected modules and scalability. 

  • 12. Carreras-Puigvert, Jordi
    et al.
    Zitnik, Marinka
    Jemth, Ann-Sofie
    Carter, Megan
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Unterlass, Judith E.
    Hallström, Björn
    Loseva, Olga
    Karem, Zhir
    Calderón-Montaño, José Manuel
    Lindskog, Cecilia
    Edqvist, Per-Henrik
    Matuszewski, Damian J.
    Blal, Hammou Ait
    Berntsson, Ronnie P. A.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Häggblad, Maria
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Martens, Ulf
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Studham, Matthew
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Lundgren, Bo
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Wählby, Carolina
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Lundberg, Emma
    Stenmark, Pål
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Zupan, Blaz
    Helleday, Thomas
    A comprehensive structural, biochemical and biological profiling of the human NUDIX hydrolase family2017Ingår i: Nature Communications, E-ISSN 2041-1723, Vol. 8, artikel-id 1541Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The NUDIX enzymes are involved in cellular metabolism and homeostasis, as well as mRNA processing. Although highly conserved throughout all organisms, their biological roles and biochemical redundancies remain largely unclear. To address this, we globally resolve their individual properties and inter-relationships. We purify 18 of the human NUDIX proteins and screen 52 substrates, providing a substrate redundancy map. Using crystal structures, we generate sequence alignment analyses revealing four major structural classes. To a certain extent, their substrate preference redundancies correlate with structural classes, thus linking structure and activity relationships. To elucidate interdependence among the NUDIX hydrolases, we pairwise deplete them generating an epistatic interaction map, evaluate cell cycle perturbations upon knockdown in normal and cancer cells, and analyse their protein and mRNA expression in normal and cancer tissues. Using a novel FUSION algorithm, we integrate all data creating a comprehensive NUDIX enzyme profile map, which will prove fundamental to understanding their biological functionality.

  • 13.
    Castresana Aguirre, Miguel
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Guala, Dimitri
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Clustered Pathway AnalysisManuskript (preprint) (Övrigt vetenskapligt)
    Abstract [en]

    Motivation: Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each cluster.

    Results: We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering substantially increased the sensitivity of pathway analysis methods. For ANUBIX this came with almost no loss of specificity, while for BinoX and NEAT the specificity decreased roughly as much as the sensitivity increased. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We conclude that clustering can improve overall pathway annotation performance, but only if the used enrichment method has a low false positive rate. 

    Availability and Implementation: https://bitbucket.org/sonnhammergroup/clustering-and-pathway-enrichment/

  • 14.
    Castresana-Aguirre, Miguel
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Guala, Dimitri
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Benefits and Challenges of Pre-clustered Network-Based Pathway Analysis2022Ingår i: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, artikel-id 855766Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each module. We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering can be beneficial by increasing the sensitivity of pathway analysis methods and by providing deeper insights of biological mechanisms related to the phenotype under study. However, keeping a high specificity is a challenge. For ANUBIX, clustering caused a minor loss of specificity, while for BinoX and NEAT it caused an unacceptable loss of specificity. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We show examples of this approach and conclude that clustering can improve overall pathway annotation performance, but should only be used if the used enrichment method has a low false positive rate.

  • 15.
    Castresana-Aguirre, Miguel
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Persson, Emma
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    PathBIX—a web server for network-based pathway annotation with adaptive null models2021Ingår i: Bioinformatics Advances, E-ISSN 2635-0041, Vol. 1, nr 1, artikel-id vbab010Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Motivation: Pathway annotation is a vital tool for interpreting and giving meaning to experimental data in life sciences. Numerous tools exist for this task, where the most recent generation of pathway enrichment analysis tools, network-based methods, utilize biological networks to gain a richer source of information as a basis of the analysis than merely the gene content. Network-based methods use the network crosstalk between the query gene set and the genes in known pathways, and compare this to a null model of random expectation.

    Results: We developed PathBIX, a novel web application for network-based pathway analysis, based on the recently published ANUBIX algorithm which has been shown to be more accurate than previous network-based methods. The PathBIX website performs pathway annotation for 21 species, and utilizes prefetched and preprocessed network data from FunCoup 5.0 networks and pathway data from three databases: KEGG, Reactome, and WikiPathways.

    Ladda ner (pdf)
    PathBIX
  • 16.
    Castresana-Aguirre, Miguel
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Pathway-specific model estimation for improved pathway annotation by network crosstalk2020Ingår i: Scientific Reports, E-ISSN 2045-2322, Vol. 10, nr 1, artikel-id 13585Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Pathway enrichment analysis is the most common approach for understanding which biological processes are affected by altered gene activities under specific conditions. However, it has been challenging to find a method that efficiently avoids false positives while keeping a high sensitivity. We here present a new network-based method ANUBIX based on sampling random gene sets against intact pathway. Benchmarking shows that ANUBIX is considerably more accurate than previous network crosstalk based methods, which have the drawback of modelling pathways as random gene sets. We demonstrate that ANUBIX does not have a bias for finding certain pathways, which previous methods do, and show that ANUBIX finds biologically relevant pathways that are missed by other methods.

  • 17. Dessimoz, Christophe
    et al.
    Gabaldón, Toni
    Roos, David S.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Herrero, Javier
    Toward community standards in the quest for orthologs2012Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, nr 6, s. 900-904Artikel i tidskrift (Övrigt vetenskapligt)
    Abstract [en]

    The identification of orthologs-genes pairs descended from a common ancestor through speciation, rather than duplication-has emerged as an essential component of many bioinformatics applications, ranging from the annotation of new genomes to experimental target prioritization. Yet, the development and application of orthology inference methods is hampered by the lack of consensus on source proteomes, file formats and benchmarks. The second 'Quest for Orthologs' meeting brought together stakeholders from various communities to address these challenges. We report on achievements and outcomes of this meeting, focusing on topics of particular relevance to the research community at large. The Quest for Orthologs consortium is an open community that welcomes contributions from all researchers interested in orthology research and applications.

  • 18. El-Gebali, Sara
    et al.
    Mistry, Jaina
    Bateman, Alex
    Eddy, Sean R.
    Luciani, Aurelien
    Potter, Simon C.
    Qureshi, Matloob
    Richardson, Lorna J.
    Salazar, Gustavo A.
    Smart, Alfredo
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Hirsh, Layla
    Paladin, Lisanna
    Piovesan, Damiano
    Tosatto, Silvio C. E.
    Finn, Robert D.
    The Pfam protein families database in 20192019Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 47, nr D1, s. D427-D432Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The last few years have witnessed significant changes in Pfam (https://pfam.xfam.org). The number of families has grown substantially to a total of 17,929 in release 32.0. New additions have been coupled with efforts to improve existing families, including refinement of domain boundaries, their classification into Pfam clans, as well as their functional annotation. We recently began to collaborate with the RepeatsDB resource to improve the definition of tandem repeat families within Pfam. We carried out a significant comparison to the structural classification database, namely the Evolutionary Classification of Protein Domains (ECOD) that led to the creation of 825 new families based on their set of uncharacterized families(EUFs). Furthermore, we also connected Pfam entries to the Sequence Ontology (SO) through mapping of the Pfam type definitions to SO terms. Since Pfam has many community contributors, we recently enabled the linking between authorship of all Pfam entries with the corresponding authors' ORCID identifiers. This effectively permits authors to claim credit for their Pfam curation and link them to their ORCID record.

  • 19. Finn, Robert D.
    et al.
    Bateman, Alex
    Clements, Jody
    Coggill, Penelope
    Eberhardt, Ruth Y.
    Eddy, Sean R.
    Heger, Andreas
    Hetherington, Kirstie
    Holm, Liisa
    Mistry, Jaina
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Tate, John
    Punta, Marco
    Pfam: the protein families database2014Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 42, nr D1, s. d222-D230Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Pfam, available via servers in the UK (http://pfam.sanger.ac.uk/) and the USA (http://pfam.janelia.org/), is a widely used database of protein families, containing 14 831 manually curated entries in the current release, version 27.0. Since the last update article 2 years ago, we have generated 1182 new families and maintained sequence coverage of the UniProt Knowledgebase (UniProtKB) at nearly 80%, despite a 50% increase in the size of the underlying sequence database. Since our 2012 article describing Pfam, we have also undertaken a comprehensive review of the features that are provided by Pfam over and above the basic family data. For each feature, we determined the relevance, computational burden, usage statistics and the functionality of the feature in a website context. As a consequence of this review, we have removed some features, enhanced others and developed new ones to meet the changing demands of computational biology. Here, we describe the changes to Pfam content. Notably, we now provide family alignments based on four different representative proteome sequence data sets and a new interactive DNA search interface. We also discuss the mapping between Pfam and known 3D structures.

  • 20. Finn, Robert D.
    et al.
    Mistry, Jaina
    Tate, John
    Coggill, Penny
    Heger, Andreas
    Pollington, Joanne E.
    Gavin, O. Luke
    Gunasekaran, Prasad
    Ceric, Goran
    Forslund, Kristoffer
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Holm, Liisa
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Eddy, Sean R.
    Bateman, Alex
    The Pfam protein families database2010Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 38, s. d211-d222Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Pfam is a widely used database of protein families and domains. This article describes a set of major updates that we have implemented in the latest release (version 24.0). The most important change is that we now use HMMER3, the latest version of the popular profile hidden Markov model package. This software is similar to 100 times faster than HMMER2 and is more sensitive due to the routine use of the forward algorithm. The move to HMMER3 has necessitated numerous changes to Pfam that are described in detail. Pfam release 24.0 contains 11 912 families, of which a large number have been significantly updated during the past two years. Pfam is available via servers in the UK (http://pfam.sanger.ac.uk/), the USA (http://pfam.janelia.org/) and Sweden (http://pfam.sbc.su.se/).

  • 21.
    Forslund, Kristoffer
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Henricson, Anna
    Hollich, Volker
    Sonnhammer, Erik L.L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Domain tree-based analysis of protein architecture evolution2008Ingår i: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 25, nr 2, s. 254-264Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Understanding the dynamics behind domain architecture evolution is of great importance to unravel the functions of proteins. Complex architectures have been created throughout evolution by rearrangement and duplication events. An interesting question is how many times a particular architecture has been created, a form of convergent evolution or domain architecture reinvention. Previous studies have approached this issue by comparing architectures found in different species. We wanted to achieve a finer-grained analysis by reconstructing protein architectures on complete domain trees. The prevalence of domain architecture reinvention in 96 genomes was investigated with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. Domain architectures were taken from Pfam. To ensure robustness, we applied the method to bootstrap trees and only considered results with strong statistical support. We detected multiple origins for 12.4% of the scored architectures. In a much smaller data set, the subset of completely domain-assigned proteins, the figure was 5.6%. These results indicate that domain architecture reinvention is a much more common phenomenon than previously thought. We also determined which domains are most frequent in multiply created architectures and assessed whether specific functions could be attributed to them. However, no strong functional bias was found in architectures with multiple origins.

  • 22.
    Forslund, Kristoffer
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Pekkari, Isabella
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Domain architecture conservation in orthologs2011Ingår i: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 12, s. 326-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.

    Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.

    Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

  • 23. Forslund, Kristoffer
    et al.
    Pereira, Cecile
    Capella-Gutierrez, Salvador
    Sousa da Silva, Alan
    Altenhoff, Adrian
    Huerta-Cepas, Jaime
    Muffato, Matthieu
    Patricio, Mateus
    Vandepoele, Klaas
    Ebersberger, Ingo
    Blake, Judith
    Fernandez Breis, Jesualdo Tomas
    Boeckmann, Brigitte
    Gabaldon, Toni
    Sonnhammer, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Dessimoz, Christophe
    Lewis, Suzanna
    Gearing up to handle the mosaic nature of life in the quest for orthologs2018Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 34, nr 2, s. 323-329Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The Quest for Orthologs (QfO) is an open collaboration framework for experts in comparative phylogenomics and related research areas who have an interest in highly accurate orthology predictions and their applications. We here report highlights and discussion points from the QfO meeting 2015 held in Barcelona. Achievements in recent years have established a basis to support developments for improved orthology prediction and to explore new approaches. Central to the QfO effort is proper benchmarking of methods and services, as well as design of standardized datasets and standardized formats to allow sharing and comparison of results. Simultaneously, analysis pipelines have been improved, evaluated and adapted to handle large datasets. All this would not have occurred without the long-term collaboration of Consortium members. Meeting regularly to review and coordinate complementary activities from a broad spectrum of innovative researchers clearly benefits the community. Highlights of the meeting include addressing sources of and legitimacy of disagreements between orthology calls, the context dependency of orthology definitions, special challenges encountered when analyzing very anciently rooted orthologies, orthology in the light of whole-genome duplications, and the concept of orthologous versus paralogous relationships at different levels, including domain-level orthology. Furthermore, particular needs for different applications (e.g. plant genomics, ancient gene families and others) and the infrastructure for making orthology inferences available (e.g. interfaces with model organism databases) were discussed, with several ongoing efforts that are expected to be reported on during the upcoming 2017 QfO meeting.

  • 24.
    Forslund, Kristoffer
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Schreiber, Fabian
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Thanintorn, Nattaphon
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    OrthoDisease: tracking disease gene orthologs across 100 species2011Ingår i: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 12, nr 5, s. 463-473Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Orthology is one of the most important tools available to modern biology, as it allows making inferences from easily studied model systems to much less tractable systems of interest, such as ourselves. This becomes important not least in the study of genetic diseases. We here review work on the orthology of disease-associated genes and also present an updated version of the InParanoid-based disease orthology database and web site OrthoDisease, with 14-fold increased species coverage since the previous version. Using this resource, we survey the taxonomic distribution of orthologs of human genes involved in different disease categories. The hypothesis that paralogs can mask the effect of deleterious mutations predicts that known heritable disease genes should have fewer close paralogs. We found large-scale support for this hypothesis as significantly fewer duplications were observed for disease genes in the OrthoDisease ortholog groups.

  • 25.
    Forslund, Kristoffer
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Swedish e-Science Research Center .
    Evolution of Protein Domain Architectures2012Ingår i: Evolutionary Genomics: Statistical and Computational Methods, Vol 2 / [ed] Anisimova, M, Totowa, NJ: Humana Press, 2012, s. 187-216Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions arc reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).

  • 26.
    Forslund, Kristoffer
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L.L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Benchmarking homology detection procedures with low complexity filters2009Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 25, nr 19, s. 2500-2505Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    BACKGROUND: Low-complexity sequence regions present a common problem in finding true homologs to a protein query sequence. Several solutions to this have been suggested, but a detailed comparison between these on challenging data has so far been lacking. A common benchmark for homology detection procedures is to use SCOP/ASTRAL domain sequences belonging to the same or different superfamilies, but these contain almost no low complexity sequences.

    RESULTS: We here introduce an alternative benchmarking strategy based around Pfam domains and clans on whole-proteome data sets. This gives a realistic level of low complexity sequences. We used it to evaluate all six built-in BLAST low complexity filter settings as well as a range of settings in the MSPcrunch post-processing filter. The effect on alignment length was also assessed.

    CONCLUSION: Score matrix adjustment methods provide a low false positive rate at a relatively small loss in sensitivity relative to no filtering, across the range of test conditions we apply. MSPcrunch achieved even less loss in sensitivity, but at a higher false positive rate. A drawback of the score matrix adjustment methods is however that the alignments often become truncated.

    AVAILABILITY: Perl scripts for MSPcrunch BLAST filtering and for generating the benchmark dataset are available at http://sonnhammer.sbc.su.se/download/software/MSPcrunch+Blixem/benchmark.tar.gz

  • 27.
    Forslund, Kristoffer
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L.L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Predicting protein function from domain content2008Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, nr 15, s. 1681-1687Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.

    RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.

    AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar

  • 28.
    Friedrich, Stefanie
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Barbulescu, Remus
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Helleday, Thomas
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    MetaCNV-a consensus approach to infer accurate copy numbers from low coverage data2020Ingår i: BMC Medical Genomics, E-ISSN 1755-8794, Vol. 13, artikel-id 76Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: The majority of copy number callers requires high read coverage data that is often achieved with elevated material input, which increases the heterogeneity of tissue samples. However, to gain insights into smaller areas within a tissue sample, e.g. a cancerous area in a heterogeneous tissue sample, less material is used for sequencing, which results in lower read coverage. Therefore, more focus needs to be put on copy number calling that is sensitive enough for low coverage data.

    Results: We present MetaCNV, a copy number caller that infers reliable copy numbers for human genomes with a consensus approach. MetaCNV specializes in low coverage data, but also performs well on normal and high coverage data. MetaCNV integrates the results of multiple copy number callers and infers absolute and unbiased copy numbers for the entire genome. MetaCNV is based on a meta-model that bypasses the weaknesses of current calling models while combining the strengths of existing approaches. Here we apply MetaCNV based on ReadDepth, SVDetect, and CNVnator to real and simulated datasets in order to demonstrate how the approach improves copy number calling.

    Conclusions: MetaCNV, available at https://bitbucket.org/sonnhammergroup/metacnv, provides accurate copy number prediction on low coverage data and performs well on high coverage data.

  • 29.
    Friedrich, Stefanie
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Fusion transcript detection using spatial transcriptomics2020Ingår i: BMC Medical Genomics, E-ISSN 1755-8794, Vol. 13, nr 1, artikel-id 110Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: Fusion transcripts are involved in tumourigenesis and play a crucial role in tumour heterogeneity, tumour evolution and cancer treatment resistance. However, fusion transcripts have not been studied at high spatial resolution in tissue sections due to the lack of full-length transcripts with spatial information. New high-throughput technologies like spatial transcriptomics measure the transcriptome of tissue sections on almost single-cell level. While this technique does not allow for direct detection of fusion transcripts, we show that they can be inferred using the relative poly(A) tail abundance of the involved parental genes.

    Method: We present a new method STfusion, which uses spatial transcriptomics to infer the presence and absence of poly(A) tails. A fusion transcript lacks a poly(A) tail for the 5 ' gene and has an elevated number of poly(A) tails for the 3 ' gene. Its expression level is defined by the upstream promoter of the 5 ' gene. STfusion measures the difference between the observed and expected number of poly(A) tails with a novel C-score.

    Results: We verified the STfusion ability to predict fusion transcripts on HeLa cells with known fusions. STfusion and C-score applied to clinical prostate cancer data revealed the spatial distribution of the cis-SAGeSLC45A3-ELK4in 12 tissue sections with almost single-cell resolution. The cis-SAGe occurred in disease areas, e.g. inflamed, prostatic intraepithelial neoplastic, or cancerous areas, and occasionally in normal glands.

    Conclusions: STfusion detects fusion transcripts in cancer cell line and clinical tissue data, and distinguishes chimeric transcripts from chimeras caused by trans-splicing events. With STfusion and the use of C-scores, fusion transcripts can be spatially localised in clinical tissue sections on almost single cell level.

  • 30.
    Frings, Oliver
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    MGclus: network clustering employing shared neighbors2013Ingår i: Molecular BioSystems, ISSN 1742-206X, Vol. 9, nr 7, s. 1670-1675Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Network analysis is an important tool for functional annotation of genes and proteins. A common approach to discern structure in a global network is to infer network clusters, or modules, and assume a functional coherence within each module, which may represent a complex or a pathway. It is however not trivial to define optimal modules. Although many methods have been proposed, it is unclear which methods perform best in general. It seems that most methods produce far from optimal results but in different ways. MGclus is a new algorithm designed to detect modules with a strongly interconnected neighborhood in large scale biological interaction networks. In our benchmarks we found MGclus to outperform other methods when applied to random graphs with varying degree of noise, and to perform equally or better when applied to biological protein interaction networks. MGclus is implemented in Java and utilizes the JGraphT graph library. It has an easy to use command-line interface and is available for download from http://sonnhammer.sbc.su.se/download/software/MGclus/.

    Ladda ner fulltext (pdf)
    fulltext
  • 31.
    Frings, Oliver
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Srinivas
    Bergh, Jonas
    Östman, Arne
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). Swedish Escience Research Center, Sweden.
    Prognostic Significance in Breast Cancer of a Gene Signature Capturing Stromal PDGF Signaling2013Ingår i: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191, Vol. 182, nr 6, s. 2037-2047Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In this study, we describe a novel gene expression signature of platelet-derived growth factor (PDGF) activated fibroblasts, which is able to identify breast cancers with a PDGF-stimulated fibroblast stroma and displays an independent and strong prognostic significance. Global gene expression was compared between PDGF-stimulated human fibroblasts and cultured resting fibroblasts. The most differentially expressed genes were reduced to a gene expression signature of 113 genes. The biological significance and prognostic capacity of this signature were investigated using four independent clinical breast cancer data sets. Concomitant high expression of PDGF beta receptor and its cognate Ligands is associated with a high PDGF signature score. This supports the notion that the signature detects tumors with PDGF-activated stroma. Subsequent analyses indicated significant associations between high PDGF signature score and clinical characteristics, including human epidermal growth factor receptor 2 positivity, estrogen receptor negativity, high tumor grade, and large tumor size. A high PDGF signature score is associated with shorter survival in univariate analysis. Furthermore, the high PDGF signature score acts as a significant marker of poor prognosis in multivariate survival analyses, including classic prognostic markers, Ki-67 status, a proliferation gene signature, or other recently described stroma-derived gene expression signatures.

  • 32.
    Frings, Oliver
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Sunny
    Bergh, Jonas
    Östman, Arne
    Sonnhammer, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Prognostic significance in breast cancer of a gene signature capturing stromal PDGF signalingIngår i: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191Artikel i tidskrift (Refereegranskat)
  • 33.
    Frings, Oliver
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Mank, Judith E.
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Network Analysis of Functional Genomics Data: Application to Avian Sex-Biased Gene Expression2012Ingår i: Scientific World Journal, E-ISSN 1537-744X, s. 130491-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.

  • 34. Gabaldón, Toni
    et al.
    Dessimoz, Christophe
    Huxley-Jones, Julie
    Vilella, Albert J
    Sonnhammer, Erik Ll
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Lewis, Suzanna
    Joining forces in the quest for orthologs2009Ingår i: Genome biology, ISSN 1465-6914, Vol. 10, nr 9, s. 403-Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Better orthology-prediction resources would be beneficial for the whole biological community. A recent meeting discussed how to coordinate and leverage current efforts.

  • 35. Glover, Natasha
    et al.
    Dessimoz, Christophe
    Ebersberger, Ingo
    Forslund, Sofia K.
    Gabaldón, Toni
    Huerta-Cepas, Jaime
    Martin, Maria-Jesus
    Muffato, Matthieu
    Patricio, Mateus
    Pereira, Cécile
    da Silva, Alan Sousa
    Wang, Yan
    Sonnhammer, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Thomas, Paul D.
    Advances and Applications in the Quest for Orthologs2019Ingår i: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 36, nr 10, s. 2157-2164Artikel, forskningsöversikt (Refereegranskat)
    Abstract [en]

    Gene families evolve by the processes of speciation (creating orthologs), gene duplication (paralogs), and horizontal gene transfer (xenologs), in addition to sequence divergence and gene loss. Orthologs in particular play an essential role in comparative genomics and phylogenomic analyses. With the continued sequencing of organisms across the tree of life, the data are available to reconstruct the unique evolutionary histories of tens of thousands of gene families. Accurate reconstruction of these histories, however, is a challenging computational problem, and the focus of the Quest for Orthologs Consortium. We review the recent advances and outstanding challenges in this field, as revealed at a symposium and meeting held at the University of Southern California in 2017. Key advances have been made both at the level of orthology algorithm development and with respect to coordination across the community of algorithm developers and orthology end-users. Applications spanned a broad range, including gene function prediction, phylostratigraphy, genome evolution, and phylogenomics. The meetings highlighted the increasing use of meta-analyses integrating results from multiple different algorithms, and discussed ongoing challenges in orthology inference as well as the next steps toward improvement and integration of orthology resources.

  • 36.
    Guala, Dimitri
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Bernhem, Kristoffer
    Ait Blal, Hammou
    Lundberg, Emma
    Brismar, Hjalmar
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Experimental validation of predicted cancer genes using FRETManuskript (preprint) (Övrigt vetenskapligt)
  • 37.
    Guala, Dimitri
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Bernhem, Kristoffer
    Blal, Hammou Ait
    Jans, Daniel
    Lundberg, Emma
    Brismar, Hjalmar
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Experimental validation of predicted cancer genes using FRET2018Ingår i: Methods and applications in fluorescence, ISSN 2050-6120, Vol. 6, nr 3, artikel-id 035007Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Huge amounts of data are generated in genome wide experiments, designed to investigate diseases with complex genetic causes. Follow up of all potential leads produced by such experiments is currently cost prohibitive and time consuming. Gene prioritization tools alleviate these constraints by directing further experimental efforts towards the most promising candidate targets. Recently a gene prioritization tool called MaxLink was shown to outperform other widely used state-of-the-art prioritization tools in a large scale in silico benchmark. An experimental validation of predictions made by MaxLink has however been lacking. In this study we used Fluorescence Resonance Energy Transfer, an established experimental technique for detection of protein-protein interactions, to validate potential cancer genes predicted by MaxLink. Our results provide confidence in the use of MaxLink for selection of new targets in the battle with polygenic diseases.

  • 38.
    Guala, Dimitri
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholm Bioinformatics Centre, Sweden.
    Sjölund, Erik
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholm Bioinformatics Centre, Sweden.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholm Bioinformatics Centre, Sweden; Swedish eScience Research Center, Sweden.
    MaxLink: network-based prioritization of genes tightly linked to a disease seed set2014Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, nr 18, s. 2689-2690Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.

  • 39.
    Guala, Dimitri
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    A large-scale benchmark of gene prioritization methods2017Ingår i: Scientific Reports, E-ISSN 2045-2322, Vol. 7, artikel-id 46598Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology (GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

  • 40.
    Guala, Dimitri
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). Merck AB, Sweden.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    Network Crosstalk as a Basis for Drug Repurposing2022Ingår i: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, artikel-id 792090Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The need for systematic drug repurposing has seen a steady increase over the past decade and may be particularly valuable to quickly remedy unexpected pandemics. The abundance of functional interaction data has allowed mapping of substantial parts of the human interactome modeled using functional association networks, favoring network-based drug repurposing. Network crosstalk-based approaches have never been tested for drug repurposing despite their success in the related and more mature field of pathway enrichment analysis. We have, therefore, evaluated the top performing crosstalk-based approaches for drug repurposing. Additionally, the volume of new interaction data as well as more sophisticated network integration approaches compelled us to construct a new benchmark for performance assessment of network-based drug repurposing tools, which we used to compare network crosstalk-based methods with a state-of-the-art technique. We find that network crosstalk-based drug repurposing is able to rival the state-of-the-art method and in some cases outperform it.

  • 41.
    Haider, Christian
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Kavic, Marina
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). University of Applied Sciences Upper Austria, Austria.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
    TreeDom: a graphical web tool for analysing domain architecture evolution2016Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, nr 15, s. 2384-2385Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    We present TreeDom, a web tool for graphically analysing the evolutionary history of domains in multi-domain proteins. Individual domains on the same protein chain may have distinct evolutionary histories, which is important to grasp in order to understand protein function. For instance, it may be important to know whether a domain was duplicated recently or long ago, to know the origin of inserted domains, or to know the pattern of domain loss within a protein family. TreeDom uses the Pfam database as the source of domain annotations, and displays these on a sequence tree. An advantage of TreeDom is that the user can limit the analysis to N sequences that are most similar to a query, or provide a list of sequence IDs to include. Using the Pfam alignment of the selected sequences, a tree is built and displayed together with the domain architecture of each sequence.

  • 42.
    Henricson, Anna
    et al.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Forslund, Kristoffer
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Sonnhammer, Erik L. L.
    Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
    Orthology confers intron position conservation2010Ingår i: BMC Genomics, E-ISSN 1471-2164, Vol. 11:412Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Background: With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence.

    Results: To this end, we developed a new score for IPC and applied it to ortholog groups between human and six other species. For comparison, we also gathered the closest non-orthologs, meaning sequences close in sequence space, yet falling just outside the ortholog cluster. We found that ortholog-ortholog gene pairs on average have a significantly higher degree of IPC compared to ortholog-closest non-ortholog pairs. Also pairs of inparalogs were found to have a higher IPC score than inparalog-closest non-inparalog pairs. We verified that these differences can not simply be attributed to the generally higher sequence identity of the ortholog-ortholog and the inparalog-inparalog pairs. Furthermore, we analyzed the agreement between IPC score and the ortholog score assigned by the InParanoid algorithm, and found that it was consistently high for all species comparisons. In a minority of cases, the IPC and InParanoid score ranked inparalogs differently. These represent cases where sequence and intron position divergence are discordant. We further analyzed the discordant clusters to identify any possible preference for protein functions by looking for enriched GO terms and Pfam protein domains. They were enriched for functions important for multicellularity, which implies a connection between shifts in intronic structure and the origin of multicellularity.

    Conclusions: We conclude that orthologous genes tend to have more conserved intron positions compared to non-orthologous genes. As a consequence, our IPC score is useful as an additional discriminating factor when assigning orthology.

  • 43. Herr, Patrick