Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
In silico modelling for refining gene regulatory network inference
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.ORCID iD: 0000-0002-6362-0659
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Gene regulation is at the centre of all cellular functions, regulating the cell's healthy and pathological responses. The interconnected system of regulatory interactions is known as the gene regulatory network (GRN), where genes influence each other to maintain strict and robust control. Today a large number of methods exist for inferring GRNs, which necessitates benchmarking to determine which method is most suitable for a specific goal. Paper I presents such a benchmark focusing on the effect of using known perturbations to infer GRNs. 

A further challenge when studying GRNs is that experimental data contains high levels of noise and that artefacts may be introduced by the experiment itself. The LSCON method was developed in paper II to reduce the effect of one such artefact that can occur if the expression of a gene shows no or minimal change across most or all experiments. 

 With few fully determined biological GRNs available, it is problematic to use these to evaluate an inference method's correctness. Instead, the GRN field relies on simulated data, using a known GRN and generating the corresponding data. When simulating GRNs, capturing the topological properties of the biological GRN is vital. The FFLatt algorithm was developed in paper III to create scale-free, feed-forward loop motif-enriched GRNs, capturing two of the most prominent topological features in biological GRNs. 

 Once a high-quality GRN is obtained, the next step is to simulate gene expression data corresponding to the GRN. In paper IV, building on the FFLatt method, an open-source Python simulation tool called GeneSNAKE was developed to generate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties and improves on previous tools by featuring a variety of perturbation schemes along with the ability to control noise and modify the perturbation strength.

Place, publisher, year, edition, pages
Stockohlm: Department of Biochemistry and Biophysics, Stockholm University , 2023. , p. 49
Keywords [en]
Gene regulatory networks, simulation, benchmarking, method development
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-221155ISBN: 978-91-8014-504-6 (print)ISBN: 978-91-8014-505-3 (electronic)OAI: oai:DiVA.org:su-221155DiVA, id: diva2:1797464
Public defence
2023-10-27, Air and Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Available from: 2023-10-04 Created: 2023-09-14 Last updated: 2023-09-29Bibliographically approved
List of papers
1. Knowledge of the perturbation design is essential for accurate gene regulatory network inference
Open this publication in new window or tab >>Knowledge of the perturbation design is essential for accurate gene regulatory network inference
Show others...
2022 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 12, no 1, article id 16531Article in journal (Refereed) Published
Abstract [en]

The gene regulatory network (GRN) of a cell executes genetic programs in response to environmental and internal cues. Two distinct classes of methods are used to infer regulatory interactions from gene expression: those that only use observed changes in gene expression, and those that use both the observed changes and the perturbation design, i.e. the targets used to cause the changes in gene expression. Considering that the GRN by definition converts input cues to changes in gene expression, it may be conjectured that the latter methods would yield more accurate inferences but this has not previously been investigated. To address this question, we evaluated a number of popular GRN inference methods that either use the perturbation design or not. For the evaluation we used targeted perturbation knockdown gene expression datasets with varying noise levels generated by two different packages, GeneNetWeaver and GeneSpider. The accuracy was evaluated on each dataset using a variety of measures. The results show that on all datasets, methods using the perturbation design matrix consistently and significantly outperform methods not using it. This was also found to be the case on a smaller experimental dataset from E. coli. Targeted gene perturbations combined with inference methods that use the perturbation design are indispensable for accurate GRN inference.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-210751 (URN)10.1038/s41598-022-19005-x (DOI)000865282300021 ()36192495 (PubMedID)2-s2.0-85139173448 (Scopus ID)
Available from: 2022-10-26 Created: 2022-10-26 Last updated: 2023-09-14Bibliographically approved
2. Fast and accurate gene regulatory network inference by normalized least squares regression
Open this publication in new window or tab >>Fast and accurate gene regulatory network inference by normalized least squares regression
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 8, p. 2263-2268, article id btac103Article in journal (Refereed) Published
Abstract [en]

Motivation: Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.

Results: We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-203209 (URN)10.1093/bioinformatics/btac103 (DOI)000761598600001 ()35176145 (PubMedID)2-s2.0-85128723779 (Scopus ID)
Available from: 2022-03-28 Created: 2022-03-28 Last updated: 2023-09-14Bibliographically approved
3. Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops
Open this publication in new window or tab >>Generation of Realistic Gene Regulatory Networks by Enriching for Feed-Forward Loops
2022 (English)In: Frontiers in Genetics, E-ISSN 1664-8021, Vol. 13, article id 815692Article in journal (Refereed) Published
Abstract [en]

The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.

Keywords
network biology, gene regulatory networks, gene-gene interaction, network motif structure, network generation, network simulation, benchmarking
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-202880 (URN)10.3389/fgene.2022.815692 (DOI)000761447700001 ()35222536 (PubMedID)
Available from: 2022-03-18 Created: 2022-03-18 Last updated: 2023-09-14Bibliographically approved
4. GeneSNAKE: a Python package for benchmarking and simulation of gene regulatory networks and expression data.
Open this publication in new window or tab >>GeneSNAKE: a Python package for benchmarking and simulation of gene regulatory networks and expression data.
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Understanding how genes interact with and regulate each other is a key challenge in systems biology. One of the primary methods to study this is through gene regulatory networks (GRNs). The field of GRN inference however faces many challenges, such as the complexity of gene regulation and high noise levels, which necessitates effective tools for evaluating inference methods. For this purpose, data that corresponds to a known GRN, from various conditions and experimental setups is necessary, which is only possible to attain via simulation.  Existing tools for simulating data for GRN inference have limitations either in the way networks are constructed or data is produced, and are often not flexible for adjusting the algorithm or parameters. 

To overcome these issues we present GeneSNAKE, a Python package designed to allow users to generate biologically realistic GRNs, and from a GRN simulate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties. GeneSNAKE improves on previous work in the field by adding a perturbation model that allows for a greater range of perturbation schemes along with the ability to control noise and modify the perturbation strength. 

For benchmarking, GeneSNAKE offers a number of functions both for comparing a true GRN to an inferred GRN, and to study properties in data and GRN models. These functions can in addition be used to study properties of biological data to produce simulated data with more realistic properties.  GeneSNAKE is an open-source, comprehensive simulation and benchmarking package with powerful capabilities that are not combined in any other single package, and thanks to the Python implementation it is simple to extend and modify by a user.

Keywords
Gene regulatory networks, simulation, benchmarking, method development
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-221154 (URN)
Available from: 2023-09-14 Created: 2023-09-14 Last updated: 2023-09-14

Open Access in DiVA

In silico modelling for refining gene regulatory network inference(1767 kB)454 downloads
File information
File name FULLTEXT01.pdfFile size 1767 kBChecksum SHA-512
1a2e52d4c4786d82b3efdb532bf6d5cfd343bc10786288da824dc833f31e8a60c5ee7b1b4ff1a7f09d2a20f7b8b5fc9adf8dad9d204f6507938588d22f05ace8
Type fulltextMimetype application/pdf

Authority records

Hillerton, Thomas

Search in DiVA

By author/editor
Hillerton, Thomas
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 454 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2149 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf