Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (9 of 9) Show all publications
Garbulowski, M., Hillerton, T., Morgan, D., Seçilmiş, D., Sonnhammer, L., Tjärnberg, A., . . . Sonnhammer, E. L. L. (2024). GeneSPIDER2: large scale GRN simulation and benchmarking with perturbed single-cell data. NAR Genomics and Bioinformatics, 6(3), Article ID lqae121.
Open this publication in new window or tab >>GeneSPIDER2: large scale GRN simulation and benchmarking with perturbed single-cell data
Show others...
2024 (English)In: NAR Genomics and Bioinformatics, E-ISSN 2631-9268, Vol. 6, no 3, article id lqae121Article in journal (Refereed) Published
Abstract [en]

Single-cell data is increasingly used for gene regulatory network (GRN) inference, and benchmarks for this have been developed based on simulated data. However, existing single-cell simulators cannot model the effects of gene perturbations. A further challenge lies in generating large-scale GRNs that often struggle with computational and stability issues. We present GeneSPIDER2, an update of the GeneSPIDER MATLAB toolbox for GRN benchmarking, inference, and analysis. Several software modules have improved capabilities and performance, and new functionalities have been added. A major improvement is the ability to generate large GRNs with biologically realistic topological properties in terms of scale-free degree distribution and modularity. Another major addition is a simulation of single-cell data, which is becoming increasingly popular as input for GRN inference. Specifically, we introduced the unique feature to generate single-cell data based on genetic perturbations. Finally, the simulated single-cell data was compared to real single-cell Perturb-seq data from two cell lines, showing that the synthetic and real data exhibit similar properties.

National Category
Biochemistry Molecular Biology
Identifiers
urn:nbn:se:su:diva-237834 (URN)10.1093/nargab/lqae121 (DOI)001314667300003 ()2-s2.0-85204555831 (Scopus ID)
Available from: 2025-01-16 Created: 2025-01-16 Last updated: 2025-10-03Bibliographically approved
Hillerton, T., Seçilmiş, D., Nelander, S. & Sonnhammer, E. L. L. (2022). Fast and accurate gene regulatory network inference by normalized least squares regression. Bioinformatics, 38(8), 2263-2268, Article ID btac103.
Open this publication in new window or tab >>Fast and accurate gene regulatory network inference by normalized least squares regression
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 8, p. 2263-2268, article id btac103Article in journal (Refereed) Published
Abstract [en]

Motivation: Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.

Results: We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-203209 (URN)10.1093/bioinformatics/btac103 (DOI)000761598600001 ()35176145 (PubMedID)2-s2.0-85128723779 (Scopus ID)
Available from: 2022-03-28 Created: 2022-03-28 Last updated: 2023-09-14Bibliographically approved
Seçilmiş, D., Hillerton, T. & Sonnhammer, E. L. L. (2022). GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods. Nucleic Acids Research, 50(W1), W398-W404
Open this publication in new window or tab >>GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods
2022 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 50, no W1, p. W398-W404Article in journal (Refereed) Published
Abstract [en]

Accurate inference of gene regulatory networks (GRN) is an essential component of systems biology, and there is a constant development of new inference methods. The most common approach to assess accuracy for publications is to benchmark the new method against a selection of existing algorithms. This often leads to a very limited comparison, potentially biasing the results, which may stem from tuning the benchmark's properties or incorrect application of other methods. These issues can be avoided by a web server with a broad range of data properties and inference algorithms, that makes it easy to perform comprehensive benchmarking of new methods, and provides a more objective assessment. Here we present https://GRNbenchmark.org/ - a new web server for benchmarking GRN inference methods, which provides the user with a set of benchmarks with several datasets, each spanning a range of properties including multiple noise levels. As soon as the web server has performed the benchmarking, the accuracy results are made privately available to the user via interactive summary plots and underlying curves. The user can then download these results for any purpose, and decide whether or not to make them public to share with the community.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-207117 (URN)10.1093/nar/gkac377 (DOI)000799563700001 ()35609981 (PubMedID)
Available from: 2022-07-06 Created: 2022-07-06 Last updated: 2022-07-06Bibliographically approved
Seçilmiş, D., Hillerton, T., Tjärnberg, A., Nelander, S., Nordling, T. E. M. & Sonnhammer, E. L. L. (2022). Knowledge of the perturbation design is essential for accurate gene regulatory network inference. Scientific Reports, 12(1), Article ID 16531.
Open this publication in new window or tab >>Knowledge of the perturbation design is essential for accurate gene regulatory network inference
Show others...
2022 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 12, no 1, article id 16531Article in journal (Refereed) Published
Abstract [en]

The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-210751 (URN)10.1038/s41598-022-19005-x (DOI)000865282300021 ()36192495 (PubMedID)2-s2.0-85139173448 (Scopus ID)
Available from: 2022-10-26 Created: 2022-10-26 Last updated: 2023-09-14Bibliographically approved
Seçilmiş, D., Nelander, S. & Sonnhammer, E. L. L. (2022). Optimal Sparsity Selection Based on an Information Criterion for Accurate Gene Regulatory Network Inference. Frontiers in Genetics, 13, Article ID 855770.
Open this publication in new window or tab >>Optimal Sparsity Selection Based on an Information Criterion for Accurate Gene Regulatory Network Inference
2022 (English)In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, article id 855770Article in journal (Refereed) Published
Abstract [en]

Accurate inference of gene regulatory networks (GRNs) is important to unravel unknown regulatory mechanisms and processes, which can lead to the identification of treatment targets for genetic diseases. A variety of GRN inference methods have been proposed that, under suitable data conditions, perform well in benchmarks that consider the entire spectrum of false-positives and -negatives. However, it is very challenging to predict which single network sparsity gives the most accurate GRN. Lacking criteria for sparsity selection, a simplistic solution is to pick the GRN that has a certain number of links per gene, which is guessed to be reasonable. However, this does not guarantee finding the GRN that has the correct sparsity or is the most accurate one. In this study, we provide a general approach for identifying the most accurate and sparsity-wise relevant GRN within the entire space of possible GRNs. The algorithm, called SPA, applies a “GRN information criterion” (GRNIC) that is inspired by two commonly used model selection criteria, Akaike and Bayesian Information Criterion (AIC and BIC) but adapted to GRN inference. The results show that the approach can, in most cases, find the GRN whose sparsity is close to the true sparsity and close to as accurate as possible with the given GRN inference method and data. The datasets and source code can be found at https://bitbucket.org/sonnhammergrni/spa/. 

Keywords
gene expression data, gene regulatory network inference, information criteria, noise in gene expression, sparsity selection, accuracy, algorithm, Article, bootstrapping, controlled study, diagnostic test accuracy study, DNA extraction, gene expression, gene regulatory network, human, machine learning, photoreceptor
National Category
Biochemistry Molecular Biology
Identifiers
urn:nbn:se:su:diva-212106 (URN)10.3389/fgene.2022.855770 (DOI)000892435200001 ()2-s2.0-85137009269 (Scopus ID)
Available from: 2022-12-01 Created: 2022-12-01 Last updated: 2025-02-20Bibliographically approved
Seçilmiş, D. (2021). Improving the accuracy of gene regulatory network inference from noisy data. (Doctoral dissertation). Stockholm: Department of Biochemistry and Biophysics, Stockholm University
Open this publication in new window or tab >>Improving the accuracy of gene regulatory network inference from noisy data
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Gene regulatory networks (GRNs) control physiological and pathological processes in a living organism, and their accurate inference from measured gene expression can identify therapeutic mechanisms for complex diseases such as cancers. The biggest obstacle in achieving the accurate reconstruction of GRNs is called ‘noise’, which considerably alters the measured gene expression because the noise generally dominates the biological signal. This situation needs to be addressed carefully so that GRN inference methods do not estimate a fit to the noise instead of the underlying biological signal. Potential noise compensation approaches are a must if the goal is to reconstruct the true system. 

To this end, within the scope of this doctoral thesis, I developed two methods that, in different ways, overcome the obstacles introduced by noise in gene expression data. Method 1 allows the collection of more informative subsets of genes whose expression is not as highly affected as those which cause the system to be overall uninformative. Method 2 infers a perturbation design that is better suited to the gene expression data than the originally intended design, and therefore produces more accurate GRNs at high noise levels. Furthermore, a benchmark study was carried out which compares the methodological backgrounds of GRN inference methods in terms of whether they utilize knowledge of the perturbation design or not, which clearly shows that utilization of the perturbation design is essential for accurate inference of GRNs. Finally a method is presented to improve GRN inference accuracy by selecting the GRN with the optimal sparsity based on information theoretical criteria. 

The three new methods (PAPERS I, II and IV) can also be used together, which is shown in this thesis to improve the GRN inference accuracy considerably more than the methods separately. As inference of accurate GRNs is a major challenge in gene regulation, the methods presented in this thesis represent an important contribution to move the field forward.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2021. p. 58
National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-196153 (URN)978-91-7911-560-9 (ISBN)978-91-7911-561-6 (ISBN)
Public defence
2021-10-15, Air & Fire, SciLifeLab, Tomtebodavägen 23 A and online via Zoom https://stockholmuniversity.zoom.us/j/64931329555, Solna, 14:00 (English)
Opponent
Supervisors
Available from: 2021-09-22 Created: 2021-09-01 Last updated: 2025-02-07Bibliographically approved
Seçilmiş, D., Hillerton, T., Nelander, S. & Sonnhammer, E. L. L. (2021). Inferring the experimental design for accurate gene regulatory network inference . Bioinformatics, 37(20), 3553-3559
Open this publication in new window or tab >>Inferring the experimental design for accurate gene regulatory network inference 
2021 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 37, no 20, p. 3553-3559Article in journal (Refereed) Published
Abstract [en]

Motivation: Accurate inference of gene regulatory interactions is of importance for understanding the mechanismsof underlying biological processes. For gene expression data gathered from targeted perturbations, gene regulatorynetwork (GRN) inference methods that use the perturbation design are the top performing methods. However, the connection between the perturbation design and gene expression can be obfuscated due to problems, such as experimental noise or off-target effects, limiting the methods’ ability to reconstruct the true GRN.

Results: In this study, we propose an algorithm, IDEMAX, to infer the effective perturbation design from gene expression data in order to eliminate the potential risk of fitting a disconnected perturbation design to gene expression. We applied IDEMAX to synthetic data from two different data generation tools, GeneNetWeaver and GeneSPIDER, and assessed its effect on the experiment design matrix as well as the accuracy of the GRN inference, followed by application to a real dataset. The results show that our approach consistently improves the accuracy of GRN inference compared to using the intended perturbation design when much of the signal is hidden by noise, which is often the case for real data.

National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-196149 (URN)10.1093/bioinformatics/btab367 (DOI)000733829400023 ()
Available from: 2021-09-01 Created: 2021-09-01 Last updated: 2025-02-07Bibliographically approved
Seçilmiş, D., Hillerton, T., Morgan, D., Tjärnberg, A., Nelander, S., Nordling, T. E. M. & Sonnhammer, E. L. L. (2020). Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data. npj Systems Biology and Applications, 6(1), Article ID 37.
Open this publication in new window or tab >>Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data
Show others...
2020 (English)In: npj Systems Biology and Applications, E-ISSN 2056-7189, Vol. 6, no 1, article id 37Article in journal (Refereed) Published
Abstract [en]

The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where similar to 1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and potential novel cancer-related regulatory interactions were identified.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-188144 (URN)10.1038/s41540-020-00154-6 (DOI)000588081000001 ()33168813 (PubMedID)
Available from: 2021-01-04 Created: 2021-01-04 Last updated: 2024-08-30Bibliographically approved
Secilmis, D., Morgan, D., Tjärnberg, A., Nelander, S., Nordling, T. & Sonnhammer, E.A Subset Selection Method for Accurate Gene Regulatory Network Inference of Uninformative Datasets.
Open this publication in new window or tab >>A Subset Selection Method for Accurate Gene Regulatory Network Inference of Uninformative Datasets
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Motivation: The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where approximately 1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. First we identified key properties of the datasets, i.e., signal-to-noise ratio (SNR) and condition number which we have shown to affect the performance of various inference methods.

Results: We found that all L1000 datasets have a very low SNR level causing them to be highly uninformative not suitable to infer accurate GRNs. Therefore, we have developed a gene reduction pipeline in which we eliminate the uninformative genes from the system using a selection criteria based on SNR until reaching an informative subset. The results show that our pipeline can identify an informative subset in an uninformative dataset, improving the accuracy of the GRN inference significantly.

National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-164634 (URN)
Available from: 2019-01-17 Created: 2019-01-17 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-8284-356x

Search in DiVA

Show all publications