Change search
Link to record
Permanent link

Direct link
Sonnhammer, Erik L. L.ORCID iD iconorcid.org/0000-0002-9015-5588
Alternative names
Publications (10 of 101) Show all publications
Buzzao, D., Steininger, L., Guala, D. & Sonnhammer, E. L. L. (2025). The FunCoup Cytoscape App: Multi-species network analysis and visualization. Bioinformatics, 41(1), Article ID btae739.
Open this publication in new window or tab >>The FunCoup Cytoscape App: Multi-species network analysis and visualization
2025 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 41, no 1, article id btae739Article in journal (Refereed) Published
Abstract [en]

Motivation: Functional association networks, such as FunCoup, are crucial for analyzing complex gene interactions. To facilitate the analysis and visualization of such genome-wide networks, there is a need for seamless integration with powerful network analysis tools like Cytoscape. Results: The FunCoup Cytoscape App integrates the FunCoup web service API with Cytoscape, allowing users to visualize and analyze gene interaction networks for 640 species. Users can input gene identifiers and customize search parameters, using various network expansion algorithms like group or independent gene search, MaxLink, and TOPAS. The app maintains consistent visualizations with the FunCoup website, providing detailed node and link information, including tissue and pathway gene annotations. The integration with Cytoscape plugins, such as ClusterMaker2, enhances the analytical capabilities of FunCoup, as exemplified by the identification of the Myasthenia gravis disease module along with potential new therapeutic targets.

National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-240400 (URN)10.1093/bioinformatics/btae739 (DOI)001388812400001 ()39700425 (PubMedID)2-s2.0-85214320388 (Scopus ID)
Available from: 2025-03-10 Created: 2025-03-10 Last updated: 2025-03-10Bibliographically approved
Paysan-Lafosse, T., Andreeva, A., Blum, M., Chuguransky, S. R., Grego, T., Pinto, B. L., . . . Bateman, A. (2025). The Pfam protein families database: Embracing AI/ML. Nucleic Acids Research, 53(D1), D523-D534
Open this publication in new window or tab >>The Pfam protein families database: Embracing AI/ML
Show others...
2025 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 53, no D1, p. D523-D534Article in journal (Refereed) Published
Abstract [en]

The Pfam protein families database is a comprehensive collection of protein domains and families used for genome annotation and protein structure and function analysis (https://www.ebi.ac.uk/interpro/). This update describes major developments in Pfam since 2020, including decommissioning the Pfam website and integration with InterPro, harmonization with the ECOD structural classification, and expanded curation of metagenomic, microprotein and repeat-containing families. We highlight how AlphaFold structure predictions are being leveraged to refine domain boundaries and identify new domains. New families discovered through large-scale sequence similarity analysis of AlphaFold models are described. We also detail the development of Pfam-N, which uses deep learning to expand family coverage, achieving an 8.8% increase in UniProtKB coverage compared to standard Pfam. We discuss plans for more frequent Pfam releases integrated with InterPro and the potential for artificial intelligence to further assist curation. Despite recent advances, many protein families remain to be classified, and Pfam continues working toward comprehensive coverage of the protein universe.

National Category
Molecular Biology
Identifiers
urn:nbn:se:su:diva-240064 (URN)10.1093/nar/gkae997 (DOI)001354632400001 ()39540428 (PubMedID)2-s2.0-85214397377 (Scopus ID)
Available from: 2025-03-03 Created: 2025-03-03 Last updated: 2025-03-03Bibliographically approved
Lundqvist, N., Garbulowski, M., Hillerton, T. & Sonnhammer, E. L. L. (2025). Topology-based metrics for finding the optimal sparsity in gene regulatory network inference. Bioinformatics, 41(5), Article ID btaf120.
Open this publication in new window or tab >>Topology-based metrics for finding the optimal sparsity in gene regulatory network inference
2025 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 41, no 5, article id btaf120Article in journal (Refereed) Published
Abstract [en]

Motivation: Gene regulatory network (GRN) inference is a complex task aiming to unravel regulatory interactions between genes in a cell. A major shortcoming of most GRN inference methods is that they do not attempt to find the optimal sparsity, i.e. the single best GRN, which is important when applying GRN inference in a real situation. Instead, the sparsity tends to be controlled by an arbitrarily set hyperparameter. Results: In this paper, two new methods for predicting the optimal sparsity of GRNs are formulated and benchmarked on simulated perturbation-based gene expression data using four GRN inference methods: LASSO, Zscore, LSCON, and GENIE3. Both sparsity prediction methods are defined using the hypothesis that the topology of real GRNs is scale-free, and are evaluated based on their ability to predict the sparsity of the true GRN. The results show that the new topology-based approaches reliably predict a sparsity close to the true one. This ability is valuable for real-world applications where a single GRN is inferred from real data. In such situations, it is vital to be able to infer a GRN with the correct sparsity.

National Category
Biochemistry
Identifiers
urn:nbn:se:su:diva-243341 (URN)10.1093/bioinformatics/btaf120 (DOI)001483462800001 ()2-s2.0-105004690157 (Scopus ID)
Available from: 2025-05-22 Created: 2025-05-22 Last updated: 2025-05-22Bibliographically approved
Buzzao, D., Castresana-Aguirre, M., Guala, D. & Sonnhammer, E. L. L. (2024). Benchmarking enrichment analysis methods with the disease pathway network. Briefings in Bioinformatics, 25(2), Article ID bbae069.
Open this publication in new window or tab >>Benchmarking enrichment analysis methods with the disease pathway network
2024 (English)In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 25, no 2, article id bbae069Article in journal (Refereed) Published
Abstract [en]

Enrichment analysis (EA) is a common approach to gain functional insights from genome-scale experiments. As a consequence, a large number of EA methods have been developed, yet it is unclear from previous studies which method is the best for a given dataset. The main issues with previous benchmarks include the complexity of correctly assigning true pathways to a test dataset, and lack of generality of the evaluation metrics, for which the rank of a single target pathway is commonly used. We here provide a generalized EA benchmark and apply it to the most widely used EA methods, representing all four categories of current approaches. The benchmark employs a new set of 82 curated gene expression datasets from DNA microarray and RNA-Seq experiments for 26 diseases, of which only 13 are cancers. In order to address the shortcomings of the single target pathway approach and to enhance the sensitivity evaluation, we present the Disease Pathway Network, in which related Kyoto Encyclopedia of Genes and Genomes pathways are linked. We introduce a novel approach to evaluate pathway EA by combining sensitivity and specificity to provide a balanced evaluation of EA methods. This approach identifies Network Enrichment Analysis methods as the overall top performers compared with overlap-based methods. By using randomized gene expression datasets, we explore the null hypothesis bias of each method, revealing that most of them produce skewed P-values.

Keywords
disease pathway network, functional enrichment, gene expression data, gene set enrichment analysis, pathway enrichment analysis, systems biology
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-235218 (URN)10.1093/bib/bbae069 (DOI)001281650100007 ()2-s2.0-85186679428 (Scopus ID)
Funder
Swedish Research Council, 2022-06725Swedish Research Council, 2018-05973Swedish Research Council, 2019-04095Stockholm University
Available from: 2024-11-01 Created: 2024-11-01 Last updated: 2025-02-07Bibliographically approved
Garbulowski, M., Hillerton, T., Morgan, D., Seçilmiş, D., Sonnhammer, L., Tjärnberg, A., . . . Sonnhammer, E. L. L. (2024). GeneSPIDER2: large scale GRN simulation and benchmarking with perturbed single-cell data. NAR Genomics and Bioinformatics, 6(3), Article ID lqae121.
Open this publication in new window or tab >>GeneSPIDER2: large scale GRN simulation and benchmarking with perturbed single-cell data
Show others...
2024 (English)In: NAR Genomics and Bioinformatics, E-ISSN 2631-9268, Vol. 6, no 3, article id lqae121Article in journal (Refereed) Published
Abstract [en]

Single-cell data is increasingly used for gene regulatory network (GRN) inference, and benchmarks for this have been developed based on simulated data. However, existing single-cell simulators cannot model the effects of gene perturbations. A further challenge lies in generating large-scale GRNs that often struggle with computational and stability issues. We present GeneSPIDER2, an update of the GeneSPIDER MATLAB toolbox for GRN benchmarking, inference, and analysis. Several software modules have improved capabilities and performance, and new functionalities have been added. A major improvement is the ability to generate large GRNs with biologically realistic topological properties in terms of scale-free degree distribution and modularity. Another major addition is a simulation of single-cell data, which is becoming increasingly popular as input for GRN inference. Specifically, we introduced the unique feature to generate single-cell data based on genetic perturbations. Finally, the simulated single-cell data was compared to real single-cell Perturb-seq data from two cell lines, showing that the synthetic and real data exhibit similar properties.

National Category
Biochemistry Molecular Biology
Identifiers
urn:nbn:se:su:diva-237834 (URN)10.1093/nargab/lqae121 (DOI)2-s2.0-85204555831 (Scopus ID)
Available from: 2025-01-16 Created: 2025-01-16 Last updated: 2025-02-20Bibliographically approved
Altenhoff, A., Nevers, Y., Tran, V., Jyothi, D., Martin, M., Cosentino, S., . . . Sonnhammer, E. L. L. (2024). New developments for the Quest for Orthologs benchmark service. NAR Genomics and Bioinformatics, 6(4), Article ID lqae167.
Open this publication in new window or tab >>New developments for the Quest for Orthologs benchmark service
Show others...
2024 (English)In: NAR Genomics and Bioinformatics, E-ISSN 2631-9268, Vol. 6, no 4, article id lqae167Article in journal (Refereed) Published
Abstract [en]

The Quest for Orthologs (QfO) orthology benchmark service (https://orthology.benchmarkservice.org) hosts a wide range of standardized benchmarks for orthology inference evaluation. It is supported and maintained by the QfO consortium, and is used to gather ortholog predictions and to examine strengths and weaknesses of newly developed and existing orthology inference methods. The web server allows different inference methods to be compared in a standardized way using the same proteome data. The benchmark results are useful for developing new methods and can help researchers to guide their choice of orthology method for applications in comparative genomics and phylogenetic analysis. We here present a new release of the Orthology Benchmark Service with a new benchmark based on feature architecture similarity as well as updated reference proteomes. We further provide a meta-analysis of the public predictions from 18 different orthology assignment methods to reveal how they relate in terms of ortholog predictions and benchmark performance. These results can guide users of orthologs to the best suited method for their purpose.

National Category
Biochemistry
Identifiers
urn:nbn:se:su:diva-240704 (URN)10.1093/nargab/lqae167 (DOI)001374275400001 ()2-s2.0-85211996425 (Scopus ID)
Available from: 2025-03-14 Created: 2025-03-14 Last updated: 2025-03-14Bibliographically approved
Langschied, F., Bordin, N., Cosentino, S., Fuentes-Palacios, D., Glover, N., Hiller, M., . . . Ebersberger, I. (2024). Quest for Orthologs in the Era of Biodiversity Genomics. Genome Biology and Evolution, 16(10), Article ID evae224.
Open this publication in new window or tab >>Quest for Orthologs in the Era of Biodiversity Genomics
Show others...
2024 (English)In: Genome Biology and Evolution, E-ISSN 1759-6653, Vol. 16, no 10, article id evae224Article, review/survey (Refereed) Published
Abstract [en]

The era of biodiversity genomics is characterized by large-scale genome sequencing efforts that aim to represent each living taxon with an assembled genome. Generating knowledge from this wealth of data has not kept up with this pace. We here discuss major challenges to integrating these novel genomes into a comprehensive functional and evolutionary network spanning the tree of life. In summary, the expanding datasets create a need for scalable gene annotation methods. To trace gene function across species, new methods must seek to increase the resolution of ortholog analyses, e.g. by extending analyses to the protein domain level and by accounting for alternative splicing. Additionally, the scope of orthology prediction should be pushed beyond well-investigated proteomes. This demands the development of specialized methods for the identification of orthologs to short proteins and noncoding RNAs and for the functional characterization of novel gene families. Furthermore, protein structures predicted by machine learning are now readily available, but this new information is yet to be integrated with orthology-based analyses. Finally, an increasing focus should be placed on making orthology assignments adhere to the findable, accessible, interoperable, and reusable (FAIR) principles. This fosters green bioinformatics by avoiding redundant computations and helps integrating diverse scientific communities sharing the need for comparative genetics and genomics information. It should also help with communicating orthology-related concepts in a format that is accessible to the public, to counteract existing misinformation about evolution.

Keywords
annotation transfer, domain architecture, FAIR, noncoding RNA, ortholog search, protein structure
National Category
Biochemistry
Identifiers
urn:nbn:se:su:diva-237226 (URN)10.1093/gbe/evae224 (DOI)39404012 (PubMedID)2-s2.0-85208099282 (Scopus ID)
Available from: 2025-01-09 Created: 2025-01-09 Last updated: 2025-01-09Bibliographically approved
Persson, E. & Sonnhammer, E. L. L. (2023). InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins. Journal of Molecular Biology, 435(14), Article ID 168001.
Open this publication in new window or tab >>InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins
2023 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 435, no 14, article id 168001Article in journal (Refereed) Published
Abstract [en]

Prediction of orthologs is an important bioinformatics pursuit that is frequently used for inferring protein function and evolutionary analyses. The InParanoid database is a well known resource of ortholog predictions between a wide variety of organisms. Although orthologs have historically been inferred at the level of full-length protein sequences, many proteins consist of several independent protein domains that may be orthologous to domains in other proteins in a way that differs from the full-length protein case. To be able to capture all types of orthologous relations, conventional full-length protein orthologs can be complemented with orthologs inferred at the domain level. We here present InParanoiDB 9, covering 640 species and providing orthologs for both protein domains and full-length proteins. InParanoiDB 9 was built using the faster InParanoid-DIAMOND algorithm for orthology analysis, as well as Domainoid and Pfam to infer orthologous domains. InParanoiDB 9 is based on proteomes from 447 eukaryotes, 158 bacteria and 35 archaea, and includes over one billion predicted ortholog groups. A new website has been built for the database, providing multiple search options as well as visualization of groups of orthologs and orthologous domains. This release constitutes a major upgrade of the InParanoid database in terms of the number of species as well as the new capability to operate on the domain level. InParanoiDB 9 is available at https://inparanoidb.sbc.su.se/.

Keywords
ortholog, InParanoid, orthologous domain, protein domain, ortholog database
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-220951 (URN)10.1016/j.jmb.2023.168001 (DOI)001054111000001 ()36764355 (PubMedID)2-s2.0-85148362111 (Scopus ID)
Available from: 2023-09-15 Created: 2023-09-15 Last updated: 2025-02-07Bibliographically approved
Castresana-Aguirre, M., Guala, D. & Sonnhammer, E. L. L. (2022). Benefits and Challenges of Pre-clustered Network-Based Pathway Analysis. Frontiers in Genetics, 13, Article ID 855766.
Open this publication in new window or tab >>Benefits and Challenges of Pre-clustered Network-Based Pathway Analysis
2022 (English)In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, article id 855766Article in journal (Refereed) Published
Abstract [en]

Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each module. We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering can be beneficial by increasing the sensitivity of pathway analysis methods and by providing deeper insights of biological mechanisms related to the phenotype under study. However, keeping a high specificity is a challenge. For ANUBIX, clustering caused a minor loss of specificity, while for BinoX and NEAT it caused an unacceptable loss of specificity. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We show examples of this approach and conclude that clustering can improve overall pathway annotation performance, but should only be used if the used enrichment method has a low false positive rate.

Keywords
functional association networks, network clustering, biological mechanisms, pathway enrichment analysis, sensitivity increase
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-207111 (URN)10.3389/fgene.2022.855766 (DOI)000802261100001 ()35620466 (PubMedID)
Available from: 2022-07-06 Created: 2022-07-06 Last updated: 2023-02-23Bibliographically approved
Hillerton, T., Seçilmiş, D., Nelander, S. & Sonnhammer, E. L. L. (2022). Fast and accurate gene regulatory network inference by normalized least squares regression. Bioinformatics, 38(8), 2263-2268, Article ID btac103.
Open this publication in new window or tab >>Fast and accurate gene regulatory network inference by normalized least squares regression
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 8, p. 2263-2268, article id btac103Article in journal (Refereed) Published
Abstract [en]

Motivation: Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.

Results: We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-203209 (URN)10.1093/bioinformatics/btac103 (DOI)000761598600001 ()35176145 (PubMedID)2-s2.0-85128723779 (Scopus ID)
Available from: 2022-03-28 Created: 2022-03-28 Last updated: 2023-09-14Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9015-5588

Search in DiVA

Show all publications