Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (8 of 8) Visa alla publikationer
Hillerton, T. (2023). In silico modelling for refining gene regulatory network inference. (Doctoral dissertation). Stockohlm: Department of Biochemistry and Biophysics, Stockholm University
Öppna denna publikation i ny flik eller fönster >>In silico modelling for refining gene regulatory network inference
2023 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Gene regulation is at the centre of all cellular functions, regulating the cell's healthy and pathological responses. The interconnected system of regulatory interactions is known as the gene regulatory network (GRN), where genes influence each other to maintain strict and robust control. Today a large number of methods exist for inferring GRNs, which necessitates benchmarking to determine which method is most suitable for a specific goal. Paper I presents such a benchmark focusing on the effect of using known perturbations to infer GRNs. 

A further challenge when studying GRNs is that experimental data contains high levels of noise and that artefacts may be introduced by the experiment itself. The LSCON method was developed in paper II to reduce the effect of one such artefact that can occur if the expression of a gene shows no or minimal change across most or all experiments. 

 With few fully determined biological GRNs available, it is problematic to use these to evaluate an inference method's correctness. Instead, the GRN field relies on simulated data, using a known GRN and generating the corresponding data. When simulating GRNs, capturing the topological properties of the biological GRN is vital. The FFLatt algorithm was developed in paper III to create scale-free, feed-forward loop motif-enriched GRNs, capturing two of the most prominent topological features in biological GRNs. 

 Once a high-quality GRN is obtained, the next step is to simulate gene expression data corresponding to the GRN. In paper IV, building on the FFLatt method, an open-source Python simulation tool called GeneSNAKE was developed to generate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties and improves on previous tools by featuring a variety of perturbation schemes along with the ability to control noise and modify the perturbation strength.

Ort, förlag, år, upplaga, sidor
Stockohlm: Department of Biochemistry and Biophysics, Stockholm University, 2023. s. 49
Nyckelord
Gene regulatory networks, simulation, benchmarking, method development
Nationell ämneskategori
Bioinformatik och systembiologi
Forskningsämne
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-221155 (URN)978-91-8014-504-6 (ISBN)978-91-8014-505-3 (ISBN)
Disputation
2023-10-27, Air and Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2023-10-04 Skapad: 2023-09-14 Senast uppdaterad: 2023-09-29Bibliografiskt granskad
Hillerton, T., Seçilmiş, D., Nelander, S. & Sonnhammer, E. L. L. (2022). Fast and accurate gene regulatory network inference by normalized least squares regression. Bioinformatics, 38(8), 2263-2268, Article ID btac103.
Öppna denna publikation i ny flik eller fönster >>Fast and accurate gene regulatory network inference by normalized least squares regression
2022 (Engelska)Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, nr 8, s. 2263-2268, artikel-id btac103Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Motivation: Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.

Results: We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.

Nationell ämneskategori
Biologiska vetenskaper Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:su:diva-203209 (URN)10.1093/bioinformatics/btac103 (DOI)000761598600001 ()35176145 (PubMedID)2-s2.0-85128723779 (Scopus ID)
Tillgänglig från: 2022-03-28 Skapad: 2022-03-28 Senast uppdaterad: 2023-09-14Bibliografiskt granskad
Zhivkoplias, E. K., Vavulov, O., Hillerton, T. & Sonnhammer, E. L. L. (2022). Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops. Frontiers in Genetics, 13, Article ID 815692.
Öppna denna publikation i ny flik eller fönster >>Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops
2022 (Engelska)Ingår i: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, artikel-id 815692Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.

Nyckelord
network biology, gene regulatory networks, gene-gene interaction, network motif structure, network generation, network simulation, benchmarking
Nationell ämneskategori
Biologiska vetenskaper
Identifikatorer
urn:nbn:se:su:diva-202880 (URN)10.3389/fgene.2022.815692 (DOI)000761447700001 ()35222536 (PubMedID)
Tillgänglig från: 2022-03-18 Skapad: 2022-03-18 Senast uppdaterad: 2023-09-14Bibliografiskt granskad
Seçilmiş, D., Hillerton, T. & Sonnhammer, E. L. L. (2022). GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods. Nucleic Acids Research, 50(W1), W398-W404
Öppna denna publikation i ny flik eller fönster >>GRNbenchmark - a web server for benchmarking directed gene regulatory network inference methods
2022 (Engelska)Ingår i: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 50, nr W1, s. W398-W404Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Accurate inference of gene regulatory networks (GRN) is an essential component of systems biology, and there is a constant development of new inference methods. The most common approach to assess accuracy for publications is to benchmark the new method against a selection of existing algorithms. This often leads to a very limited comparison, potentially biasing the results, which may stem from tuning the benchmark's properties or incorrect application of other methods. These issues can be avoided by a web server with a broad range of data properties and inference algorithms, that makes it easy to perform comprehensive benchmarking of new methods, and provides a more objective assessment. Here we present https://GRNbenchmark.org/ - a new web server for benchmarking GRN inference methods, which provides the user with a set of benchmarks with several datasets, each spanning a range of properties including multiple noise levels. As soon as the web server has performed the benchmarking, the accuracy results are made privately available to the user via interactive summary plots and underlying curves. The user can then download these results for any purpose, and decide whether or not to make them public to share with the community.

Nationell ämneskategori
Biologiska vetenskaper
Identifikatorer
urn:nbn:se:su:diva-207117 (URN)10.1093/nar/gkac377 (DOI)000799563700001 ()35609981 (PubMedID)
Tillgänglig från: 2022-07-06 Skapad: 2022-07-06 Senast uppdaterad: 2022-07-06Bibliografiskt granskad
Seçilmiş, D., Hillerton, T., Tjärnberg, A., Nelander, S., Nordling, T. E. M. & Sonnhammer, E. L. L. (2022). Knowledge of the perturbation design is essential for accurate gene regulatory network inference. Scientific Reports, 12(1), Article ID 16531.
Öppna denna publikation i ny flik eller fönster >>Knowledge of the perturbation design is essential for accurate gene regulatory network inference
Visa övriga...
2022 (Engelska)Ingår i: Scientific Reports, E-ISSN 2045-2322, Vol. 12, nr 1, artikel-id 16531Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.

Nationell ämneskategori
Biologiska vetenskaper
Identifikatorer
urn:nbn:se:su:diva-210751 (URN)10.1038/s41598-022-19005-x (DOI)000865282300021 ()36192495 (PubMedID)2-s2.0-85139173448 (Scopus ID)
Tillgänglig från: 2022-10-26 Skapad: 2022-10-26 Senast uppdaterad: 2023-09-14Bibliografiskt granskad
Seçilmiş, D., Hillerton, T., Nelander, S. & Sonnhammer, E. L. L. (2021). Inferring the experimental design for accurate gene regulatory network inference . Bioinformatics, 37(20), 3553-3559
Öppna denna publikation i ny flik eller fönster >>Inferring the experimental design for accurate gene regulatory network inference 
2021 (Engelska)Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 37, nr 20, s. 3553-3559Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Motivation: Accurate inference of gene regulatory interactions is of importance for understanding the mechanismsof underlying biological processes. For gene expression data gathered from targeted perturbations, gene regulatorynetwork (GRN) inference methods that use the perturbation design are the top performing methods. However, the connection between the perturbation design and gene expression can be obfuscated due to problems, such as experimental noise or off-target effects, limiting the methods’ ability to reconstruct the true GRN.

Results: In this study, we propose an algorithm, IDEMAX, to infer the effective perturbation design from gene expression data in order to eliminate the potential risk of fitting a disconnected perturbation design to gene expression. We applied IDEMAX to synthetic data from two different data generation tools, GeneNetWeaver and GeneSPIDER, and assessed its effect on the experiment design matrix as well as the accuracy of the GRN inference, followed by application to a real dataset. The results show that our approach consistently improves the accuracy of GRN inference compared to using the intended perturbation design when much of the signal is hidden by noise, which is often the case for real data.

Nationell ämneskategori
Bioinformatik och systembiologi
Identifikatorer
urn:nbn:se:su:diva-196149 (URN)10.1093/bioinformatics/btab367 (DOI)000733829400023 ()
Tillgänglig från: 2021-09-01 Skapad: 2021-09-01 Senast uppdaterad: 2022-01-04Bibliografiskt granskad
Seçilmiş, D., Hillerton, T., Morgan, D., Tjärnberg, A., Nelander, S., Nordling, T. E. M. & Sonnhammer, E. L. L. (2020). Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data. npj Systems Biology and Applications, 6(1), Article ID 37.
Öppna denna publikation i ny flik eller fönster >>Uncovering cancer gene regulation by accurate regulatory network inference from uninformative data
Visa övriga...
2020 (Engelska)Ingår i: npj Systems Biology and Applications, E-ISSN 2056-7189, Vol. 6, nr 1, artikel-id 37Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where similar to 1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. We found that these datasets have a very low signal-to-noise ratio (SNR) level causing them to be too uninformative to infer accurate GRNs. We developed a gene reduction pipeline in which we eliminate uninformative genes from the system using a selection criterion based on SNR, until reaching an informative subset. The results show that our pipeline can identify an informative subset in an overall uninformative dataset, allowing inference of accurate subset GRNs. The accurate GRNs were functionally characterized and potential novel cancer-related regulatory interactions were identified.

Nationell ämneskategori
Biologiska vetenskaper
Identifikatorer
urn:nbn:se:su:diva-188144 (URN)10.1038/s41540-020-00154-6 (DOI)000588081000001 ()33168813 (PubMedID)
Tillgänglig från: 2021-01-04 Skapad: 2021-01-04 Senast uppdaterad: 2024-08-30Bibliografiskt granskad
Hillerton, T., Erik K., Z., Garbulowski, M. & Sonnhammer, E. L. L.GeneSNAKE: a Python package for benchmarking and simulation of gene regulatory networks and expression data..
Öppna denna publikation i ny flik eller fönster >>GeneSNAKE: a Python package for benchmarking and simulation of gene regulatory networks and expression data.
(Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Understanding how genes interact with and regulate each other is a key challenge in systems biology. One of the primary methods to study this is through gene regulatory networks (GRNs). The field of GRN inference however faces many challenges, such as the complexity of gene regulation and high noise levels, which necessitates effective tools for evaluating inference methods. For this purpose, data that corresponds to a known GRN, from various conditions and experimental setups is necessary, which is only possible to attain via simulation.  Existing tools for simulating data for GRN inference have limitations either in the way networks are constructed or data is produced, and are often not flexible for adjusting the algorithm or parameters. 

To overcome these issues we present GeneSNAKE, a Python package designed to allow users to generate biologically realistic GRNs, and from a GRN simulate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties. GeneSNAKE improves on previous work in the field by adding a perturbation model that allows for a greater range of perturbation schemes along with the ability to control noise and modify the perturbation strength. 

For benchmarking, GeneSNAKE offers a number of functions both for comparing a true GRN to an inferred GRN, and to study properties in data and GRN models. These functions can in addition be used to study properties of biological data to produce simulated data with more realistic properties.  GeneSNAKE is an open-source, comprehensive simulation and benchmarking package with powerful capabilities that are not combined in any other single package, and thanks to the Python implementation it is simple to extend and modify by a user.

Nyckelord
Gene regulatory networks, simulation, benchmarking, method development
Nationell ämneskategori
Bioinformatik och systembiologi
Forskningsämne
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-221154 (URN)
Tillgänglig från: 2023-09-14 Skapad: 2023-09-14 Senast uppdaterad: 2023-09-14
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-6362-0659

Sök vidare i DiVA

Visa alla publikationer