Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards Reliable Gene Regulatory Network Inference
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. SciLifeLab. (Sonnhammer)ORCID iD: 0000-0001-8326-6178
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Phenotypic traits are now known to stem from the interplay between genetic variables across many if not every level of biology. The field of gene regulatory network (GRN) inference is concerned with understanding the regulatory interactions between genes in a cell, in order to build a model that captures the behaviour of the system. Perturbation biology, whereby genes or RNAs are targeted and their activity altered, is of great value for the GRN field. By first systematically perturbing the system and then reading the system's reaction as a whole, we can feed this data into various methods to reverse engineer the key agents of change.

The initial study sets the groundwork for the rest, and deals with finding common ground among the sundry methods in order to compare and rank performance in an unbiased setting. The GeneSPIDER (GS) MATLAB package is an inference benchmarking platform whereby methods can be added via a wrapper for testing in competition with one another. Synthetic datasets and networks spanning a wide range of conditions can be created for this purpose. The evaluation of methods across various conditions in the benchmark therein demonstrates which properties influence the accuracy of which methods, and thus which are more suitable for use under given characterized condition.

The second study introduces a novel framework NestBoot for increasing inference accuracy within the GS environment by independent, nested bootstraps, \ie repeated inference trials. Under low to medium noise levels, this allows support to be gathered for links occurring most often while spurious links are discarded through comparison to an estimated null distribution of shuffled-links. While noise continues to plague every method, nested bootstrapping in this way is shown to increase the accuracy of several different methods.

The third study applies NestBoot on real data to infer a reliable GRN from an small interfering RNA (siRNA) perturbation dataset covering 40 genes known or suspected to have a role in human cancers. Methods were developed to benchmark the accuracy of an inferred GRN in the absence of a true known GRN, by assessing how well it fits the data compared to a null model of shuffled topologies. A network of high confidence was recovered containing many regulatory links known in the literature, as well as a slew of novel links.

The fourth study seeks to infer reliable networks on large scale, utilizing the high dimensional biological datasets of the LINCS L1000 project.  This dataset has too much noise for accurate GRN inference as a whole, hence we developed a method to select a  subset that is sufficiently informative to accurately infer GRNs. This is a first step in the direction of identifying probable submodules within a greater genome-scale GRN yet to be uncovered.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2019. , p. 40
Keywords [en]
GRN, network inference, biological systems
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-164642ISBN: 978-91-7797-600-4 (print)ISBN: 978-91-7797-601-1 (electronic)OAI: oai:DiVA.org:su-164642DiVA, id: diva2:1279914
Public defence
2019-04-05, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 4: Manuscript.

Available from: 2019-03-13 Created: 2019-01-17 Last updated: 2019-03-18Bibliographically approved
List of papers
1. GeneSPIDER - gene regulatory network inference benchmarking with controlled network and data properties
Open this publication in new window or tab >>GeneSPIDER - gene regulatory network inference benchmarking with controlled network and data properties
Show others...
2017 (English)In: Molecular Biosystems, ISSN 1742-206X, E-ISSN 1742-2051, Vol. 13, no 7, p. 1304-1312Article in journal (Refereed) Published
Abstract [en]

A key question in network inference, that has not been properly answered, is what accuracy can be expected for a given biological dataset and inference method. We present GeneSPIDER - a Matlab package for tuning, running, and evaluating inference algorithms that allows independent control of network and data properties to enable data-driven benchmarking. GeneSPIDER is uniquely suited to address this question by first extracting salient properties from the experimental data and then generating simulated networks and data that closely match these properties. It enables data-driven algorithm selection, estimation of inference accuracy from biological data, and a more multifaceted benchmarking. Included are generic pipelines for the design of perturbation experiments, bootstrapping, analysis of linear dependence, sample selection, scaling of SNR, and performance evaluation. With GeneSPIDER we aim to move the goal of network inference benchmarks from simple performance measurement to a deeper understanding of how the accuracy of an algorithm is determined by different combinations of network and data properties.

National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-145342 (URN)10.1039/c7mb00058h (DOI)000404471900005 ()28485748 (PubMedID)
Available from: 2017-07-27 Created: 2017-07-27 Last updated: 2019-02-11Bibliographically approved
2. A generalized framework for controlling FDR in gene regulatory network inference
Open this publication in new window or tab >>A generalized framework for controlling FDR in gene regulatory network inference
2019 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 6, p. 1026-1032Article in journal (Refereed) Published
Abstract [en]

Motivation: Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied.

Results: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-164632 (URN)10.1093/bioinformatics/bty764 (DOI)000462709200016 ()
Available from: 2019-01-17 Created: 2019-01-17 Last updated: 2019-04-29Bibliographically approved
3. Perturbation-based gene regulatory network inference to unravel oncogenic mechanisms
Open this publication in new window or tab >>Perturbation-based gene regulatory network inference to unravel oncogenic mechanisms
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Motivation: Cancer is known to stem from multiple, independent mutations, the effects of which aggregate to drive the cell into a cancerous state. To understand the complex interplay between affected genes, their gene regulatory network (GRN) needs to be uncovered, to revealing detailed insights of regulatory mechanisms. We therefore decided to infer a reliable GRN from perturbation responses of 40 genes known or suspected to have a role in human cancers yet whose regulatory interactions are poorly known.

Results: siRNA knockdown experiments of each gene were done in a human squamous carcinoma cell line, after which the transcriptomic response was measured. From these data GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. The best GRN was shown to be significantly more predictive than the null model, both in crossvalidated benchmarks and for an independent dataset of the same genes but subjected to double perturbations. It agrees with many known links in addition to predicting a large number of novel interactions, a subset of which were experimentally validated. The inferred GRN captures regulatory interactions central to cancer-relevant processes and thus provides mechanistic insights that are useful for future cancer research.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-164633 (URN)
Available from: 2019-01-17 Created: 2019-01-17 Last updated: 2019-02-11Bibliographically approved
4. A Subset Selection Method for Accurate Gene Regulatory Network Inference of Uninformative Datasets
Open this publication in new window or tab >>A Subset Selection Method for Accurate Gene Regulatory Network Inference of Uninformative Datasets
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Motivation: The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where approximately 1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. First we identified key properties of the datasets, i.e., signal-to-noise ratio (SNR) and condition number which we have shown to affect the performance of various inference methods.

Results: We found that all L1000 datasets have a very low SNR level causing them to be highly uninformative not suitable to infer accurate GRNs. Therefore, we have developed a gene reduction pipeline in which we eliminate the uninformative genes from the system using a selection criteria based on SNR until reaching an informative subset. The results show that our pipeline can identify an informative subset in an uninformative dataset, improving the accuracy of the GRN inference significantly.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-164634 (URN)
Available from: 2019-01-17 Created: 2019-01-17 Last updated: 2019-01-23Bibliographically approved

Open Access in DiVA

Towards Reliable Gene Regulatory Network Inference(1175 kB)128 downloads
File information
File name FULLTEXT01.pdfFile size 1175 kBChecksum SHA-512
8749c6429d961c2f452f415620758369a9c8cafbb8ed68a31b3988214c6b6ddadbb0c061fc48ed77304caa28014ce1cf57fd16c51882b1e10bfc14791a71cd88
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Morgan, Daniel
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 128 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 775 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf