Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Subset Selection Method for Accurate Gene Regulatory Network Inference of Uninformative Datasets
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).ORCID iD: 0000-0001-8326-6178
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Motivation: The interactions among the components of a living cell that constitute the gene regulatory network (GRN) can be inferred from perturbation-based gene expression data. Such networks are useful for providing mechanistic insights of a biological system. In order to explore the feasibility and quality of GRN inference at a large scale, we used the L1000 data where approximately 1000 genes have been perturbed and their expression levels have been quantified in 9 cancer cell lines. First we identified key properties of the datasets, i.e., signal-to-noise ratio (SNR) and condition number which we have shown to affect the performance of various inference methods.

Results: We found that all L1000 datasets have a very low SNR level causing them to be highly uninformative not suitable to infer accurate GRNs. Therefore, we have developed a gene reduction pipeline in which we eliminate the uninformative genes from the system using a selection criteria based on SNR until reaching an informative subset. The results show that our pipeline can identify an informative subset in an uninformative dataset, improving the accuracy of the GRN inference significantly.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-164634OAI: oai:DiVA.org:su-164634DiVA, id: diva2:1279866
Available from: 2019-01-17 Created: 2019-01-17 Last updated: 2019-01-23Bibliographically approved
In thesis
1. Towards Reliable Gene Regulatory Network Inference
Open this publication in new window or tab >>Towards Reliable Gene Regulatory Network Inference
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Phenotypic traits are now known to stem from the interplay between genetic variables across many if not every level of biology. The field of gene regulatory network (GRN) inference is concerned with understanding the regulatory interactions between genes in a cell, in order to build a model that captures the behaviour of the system. Perturbation biology, whereby genes or RNAs are targeted and their activity altered, is of great value for the GRN field. By first systematically perturbing the system and then reading the system's reaction as a whole, we can feed this data into various methods to reverse engineer the key agents of change.

The initial study sets the groundwork for the rest, and deals with finding common ground among the sundry methods in order to compare and rank performance in an unbiased setting. The GeneSPIDER (GS) MATLAB package is an inference benchmarking platform whereby methods can be added via a wrapper for testing in competition with one another. Synthetic datasets and networks spanning a wide range of conditions can be created for this purpose. The evaluation of methods across various conditions in the benchmark therein demonstrates which properties influence the accuracy of which methods, and thus which are more suitable for use under given characterized condition.

The second study introduces a novel framework NestBoot for increasing inference accuracy within the GS environment by independent, nested bootstraps, \ie repeated inference trials. Under low to medium noise levels, this allows support to be gathered for links occurring most often while spurious links are discarded through comparison to an estimated null distribution of shuffled-links. While noise continues to plague every method, nested bootstrapping in this way is shown to increase the accuracy of several different methods.

The third study applies NestBoot on real data to infer a reliable GRN from an small interfering RNA (siRNA) perturbation dataset covering 40 genes known or suspected to have a role in human cancers. Methods were developed to benchmark the accuracy of an inferred GRN in the absence of a true known GRN, by assessing how well it fits the data compared to a null model of shuffled topologies. A network of high confidence was recovered containing many regulatory links known in the literature, as well as a slew of novel links.

The fourth study seeks to infer reliable networks on large scale, utilizing the high dimensional biological datasets of the LINCS L1000 project.  This dataset has too much noise for accurate GRN inference as a whole, hence we developed a method to select a  subset that is sufficiently informative to accurately infer GRNs. This is a first step in the direction of identifying probable submodules within a greater genome-scale GRN yet to be uncovered.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2019. p. 40
Keywords
GRN, network inference, biological systems
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-164642 (URN)978-91-7797-600-4 (ISBN)978-91-7797-601-1 (ISBN)
Public defence
2019-04-05, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Manuscript. Paper 4: Manuscript.

Available from: 2019-03-13 Created: 2019-01-17 Last updated: 2019-03-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Secilmis, DenizMorgan, DanielSonnhammer, Erik
By organisation
Department of Biochemistry and BiophysicsScience for Life Laboratory (SciLifeLab)
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 181 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf