Change search
Link to record
Permanent link

Direct link
Sonnhammer, Erik L. L.ORCID iD iconorcid.org/0000-0002-9015-5588
Alternative names
Publications (10 of 94) Show all publications
Persson, E. & Sonnhammer, E. L. L. (2023). InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins. Journal of Molecular Biology, 435(14), Article ID 168001.
Open this publication in new window or tab >>InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins
2023 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 435, no 14, article id 168001Article in journal (Refereed) Published
Abstract [en]

Prediction of orthologs is an important bioinformatics pursuit that is frequently used for inferring protein function and evolutionary analyses. The InParanoid database is a well known resource of ortholog predictions between a wide variety of organisms. Although orthologs have historically been inferred at the level of full-length protein sequences, many proteins consist of several independent protein domains that may be orthologous to domains in other proteins in a way that differs from the full-length protein case. To be able to capture all types of orthologous relations, conventional full-length protein orthologs can be complemented with orthologs inferred at the domain level. We here present InParanoiDB 9, covering 640 species and providing orthologs for both protein domains and full-length proteins. InParanoiDB 9 was built using the faster InParanoid-DIAMOND algorithm for orthology analysis, as well as Domainoid and Pfam to infer orthologous domains. InParanoiDB 9 is based on proteomes from 447 eukaryotes, 158 bacteria and 35 archaea, and includes over one billion predicted ortholog groups. A new website has been built for the database, providing multiple search options as well as visualization of groups of orthologs and orthologous domains. This release constitutes a major upgrade of the InParanoid database in terms of the number of species as well as the new capability to operate on the domain level. InParanoiDB 9 is available at https://inparanoidb.sbc.su.se/.

Keywords
ortholog, InParanoid, orthologous domain, protein domain, ortholog database
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:su:diva-220951 (URN)10.1016/j.jmb.2023.168001 (DOI)001054111000001 ()36764355 (PubMedID)2-s2.0-85148362111 (Scopus ID)
Available from: 2023-09-15 Created: 2023-09-15 Last updated: 2023-10-16Bibliographically approved
Castresana-Aguirre, M., Guala, D. & Sonnhammer, E. L. L. (2022). Benefits and Challenges of Pre-clustered Network-Based Pathway Analysis. Frontiers in Genetics, 13, Article ID 855766.
Open this publication in new window or tab >>Benefits and Challenges of Pre-clustered Network-Based Pathway Analysis
2022 (English)In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, article id 855766Article in journal (Refereed) Published
Abstract [en]

Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each module. We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering can be beneficial by increasing the sensitivity of pathway analysis methods and by providing deeper insights of biological mechanisms related to the phenotype under study. However, keeping a high specificity is a challenge. For ANUBIX, clustering caused a minor loss of specificity, while for BinoX and NEAT it caused an unacceptable loss of specificity. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We show examples of this approach and conclude that clustering can improve overall pathway annotation performance, but should only be used if the used enrichment method has a low false positive rate.

Keywords
functional association networks, network clustering, biological mechanisms, pathway enrichment analysis, sensitivity increase
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-207111 (URN)10.3389/fgene.2022.855766 (DOI)000802261100001 ()35620466 (PubMedID)
Available from: 2022-07-06 Created: 2022-07-06 Last updated: 2023-02-23Bibliographically approved
Hillerton, T., Seçilmiş, D., Nelander, S. & Sonnhammer, E. L. L. (2022). Fast and accurate gene regulatory network inference by normalized least squares regression. Bioinformatics, 38(8), 2263-2268, Article ID btac103.
Open this publication in new window or tab >>Fast and accurate gene regulatory network inference by normalized least squares regression
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 8, p. 2263-2268, article id btac103Article in journal (Refereed) Published
Abstract [en]

Motivation: Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.

Results: We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-203209 (URN)10.1093/bioinformatics/btac103 (DOI)000761598600001 ()35176145 (PubMedID)2-s2.0-85128723779 (Scopus ID)
Available from: 2022-03-28 Created: 2022-03-28 Last updated: 2023-09-14Bibliographically approved
Zhivkoplias, E. K., Vavulov, O., Hillerton, T. & Sonnhammer, E. L. L. (2022). Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops. Frontiers in Genetics, 13, Article ID 815692.
Open this publication in new window or tab >>Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops
2022 (English)In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, article id 815692Article in journal (Refereed) Published
Abstract [en]

The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.

Keywords
network biology, gene regulatory networks, gene-gene interaction, network motif structure, network generation, network simulation, benchmarking
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-202880 (URN)10.3389/fgene.2022.815692 (DOI)000761447700001 ()35222536 (PubMedID)
Available from: 2022-03-18 Created: 2022-03-18 Last updated: 2023-09-14Bibliographically approved
Seçilmiş, D., Hillerton, T. & Sonnhammer, E. L. L. (2022). GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods. Nucleic Acids Research, 50(W1), W398-W404
Open this publication in new window or tab >>GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods
2022 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 50, no W1, p. W398-W404Article in journal (Refereed) Published
Abstract [en]

Accurate inference of gene regulatory networks (GRN) is an essential component of systems biology, and there is a constant development of new inference methods. The most common approach to assess accuracy for publications is to benchmark the new method against a selection of existing algorithms. This often leads to a very limited comparison, potentially biasing the results, which may stem from tuning the benchmark's properties or incorrect application of other methods. These issues can be avoided by a web server with a broad range of data properties and inference algorithms, that makes it easy to perform comprehensive benchmarking of new methods, and provides a more objective assessment. Here we present https://GRNbenchmark.org/ - a new web server for benchmarking GRN inference methods, which provides the user with a set of benchmarks with several datasets, each spanning a range of properties including multiple noise levels. As soon as the web server has performed the benchmarking, the accuracy results are made privately available to the user via interactive summary plots and underlying curves. The user can then download these results for any purpose, and decide whether or not to make them public to share with the community.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-207117 (URN)10.1093/nar/gkac377 (DOI)000799563700001 ()35609981 (PubMedID)
Available from: 2022-07-06 Created: 2022-07-06 Last updated: 2022-07-06Bibliographically approved
Persson, E. & Sonnhammer, E. L. L. (2022). InParanoid-DIAMOND: faster orthology analysis with the InParanoid algorithm. Bioinformatics, 38(10), 2918-2919
Open this publication in new window or tab >>InParanoid-DIAMOND: faster orthology analysis with the InParanoid algorithm
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 10, p. 2918-2919Article in journal (Refereed) Published
Abstract [en]

Predicting orthologs, genes in different species having shared ancestry, is an important task in bioinformatics. Orthology prediction tools are required to make accurate and fast predictions, in order to analyze large amounts of data within a feasible time frame. InParanoid is a well-known algorithm for orthology analysis, shown to perform well in benchmarks, but having the major limitation of long runtimes on large datasets. Here, we present an update to the InParanoid algorithm that can use the faster tool DIAMOND instead of BLAST for the homolog search step. We show that it reduces the runtime by 94%, while still obtaining similar performance in the Quest for Orthologs benchmark. 

National Category
Other Biological Topics
Identifiers
urn:nbn:se:su:diva-204487 (URN)10.1093/bioinformatics/btac194 (DOI)000785761400001 ()35357425 (PubMedID)2-s2.0-85132369777 (Scopus ID)
Available from: 2022-05-09 Created: 2022-05-09 Last updated: 2024-06-10Bibliographically approved
Seçilmiş, D., Hillerton, T., Tjärnberg, A., Nelander, S., Nordling, T. E. M. & Sonnhammer, E. L. L. (2022). Knowledge of the perturbation design is essential for accurate gene regulatory network inference. Scientific Reports, 12(1), Article ID 16531.
Open this publication in new window or tab >>Knowledge of the perturbation design is essential for accurate gene regulatory network inference
Show others...
2022 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 12, no 1, article id 16531Article in journal (Refereed) Published
Abstract [en]

The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-210751 (URN)10.1038/s41598-022-19005-x (DOI)000865282300021 ()36192495 (PubMedID)2-s2.0-85139173448 (Scopus ID)
Available from: 2022-10-26 Created: 2022-10-26 Last updated: 2023-09-14Bibliographically approved
Guala, D. & Sonnhammer, E. L. L. (2022). Network Crosstalk as a Basis for Drug Repurposing. Frontiers in Genetics, 13, Article ID 792090.
Open this publication in new window or tab >>Network Crosstalk as a Basis for Drug Repurposing
2022 (English)In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, article id 792090Article in journal (Refereed) Published
Abstract [en]

The need for systematic drug repurposing has seen a steady increase over the past decade and may be particularly valuable to quickly remedy unexpected pandemics. The abundance of functional interaction data has allowed mapping of substantial parts of the human interactome modeled using functional association networks, favoring network-based drug repurposing. Network crosstalk-based approaches have never been tested for drug repurposing despite their success in the related and more mature field of pathway enrichment analysis. We have, therefore, evaluated the top performing crosstalk-based approaches for drug repurposing. Additionally, the volume of new interaction data as well as more sophisticated network integration approaches compelled us to construct a new benchmark for performance assessment of network-based drug repurposing tools, which we used to compare network crosstalk-based methods with a state-of-the-art technique. We find that network crosstalk-based drug repurposing is able to rival the state-of-the-art method and in some cases outperform it.

Keywords
drug repurposing, drug repositioning, network-based, benchmark, functional association network, network crosstalk, shortest path
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-204486 (URN)10.3389/fgene.2022.792090 (DOI)000775321100001 ()35350247 (PubMedID)2-s2.0-85127303370 (Scopus ID)
Note

For corrigendum, see (DOI):

https://doi.org/10.3389/fgene.2022.921286

Available from: 2022-05-09 Created: 2022-05-09 Last updated: 2023-02-23Bibliographically approved
Seçilmiş, D., Nelander, S. & Sonnhammer, E. L. L. (2022). Optimal Sparsity Selection Based on an Information Criterion for Accurate Gene Regulatory Network Inference. Frontiers in Genetics, 13, Article ID 855770.
Open this publication in new window or tab >>Optimal Sparsity Selection Based on an Information Criterion for Accurate Gene Regulatory Network Inference
2022 (English)In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, article id 855770Article in journal (Refereed) Published
Abstract [en]

Accurate inference of gene regulatory networks (GRNs) is important to unravel unknown regulatory mechanisms and processes, which can lead to the identification of treatment targets for genetic diseases. A variety of GRN inference methods have been proposed that, under suitable data conditions, perform well in benchmarks that consider the entire spectrum of false-positives and -negatives. However, it is very challenging to predict which single network sparsity gives the most accurate GRN. Lacking criteria for sparsity selection, a simplistic solution is to pick the GRN that has a certain number of links per gene, which is guessed to be reasonable. However, this does not guarantee finding the GRN that has the correct sparsity or is the most accurate one. In this study, we provide a general approach for identifying the most accurate and sparsity-wise relevant GRN within the entire space of possible GRNs. The algorithm, called SPA, applies a “GRN information criterion” (GRNIC) that is inspired by two commonly used model selection criteria, Akaike and Bayesian Information Criterion (AIC and BIC) but adapted to GRN inference. The results show that the approach can, in most cases, find the GRN whose sparsity is close to the true sparsity and close to as accurate as possible with the given GRN inference method and data. The datasets and source code can be found at https://bitbucket.org/sonnhammergrni/spa/. 

Keywords
gene expression data, gene regulatory network inference, information criteria, noise in gene expression, sparsity selection, accuracy, algorithm, Article, bootstrapping, controlled study, diagnostic test accuracy study, DNA extraction, gene expression, gene regulatory network, human, machine learning, photoreceptor
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:su:diva-212106 (URN)10.3389/fgene.2022.855770 (DOI)000892435200001 ()2-s2.0-85137009269 (Scopus ID)
Available from: 2022-12-01 Created: 2022-12-01 Last updated: 2023-02-23Bibliographically approved
Ogris, C., Castresana-Aguirre, M. & Sonnhammer, E. L. L. (2022). PathwAX II: network-based pathway analysis with interactive visualization of network crosstalk. Bioinformatics, 38(9), 2659-2660
Open this publication in new window or tab >>PathwAX II: network-based pathway analysis with interactive visualization of network crosstalk
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 9, p. 2659-2660Article in journal (Refereed) Published
Abstract [en]

Motivation: Pathway annotation tools are indispensable for the interpretation of a wide range of experiments in life sciences. Network-based algorithms have recently been developed which are more sensitive than traditional overlap-based algorithms, but there is still a lack of good online tools for network-based pathway analysis. Results: We present PathwAX II-a pathway analysis web tool based on network crosstalk analysis using the BinoX algorithm. It offers several new features compared with the first version, including interactive graphical network visualization of the crosstalk between a query gene set and an enriched pathway, and the addition of Reactome pathways.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-204489 (URN)10.1093/bioinformatics/btac153 (DOI)000785759800001 ()35266519 (PubMedID)
Available from: 2022-05-09 Created: 2022-05-09 Last updated: 2022-05-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9015-5588

Search in DiVA

Show all publications