Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring the Boundaries of Gene Regulatory Network Inference
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. (Sonnhammer)
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

To understand how the components of a complex system like the biological cell interact and regulate each other, we need to collect data for how the components respond to system perturbations. Such data can then be used to solve the inverse problem of inferring a network that describes how the pieces influence each other. The work in this thesis deals with modelling the cell regulatory system, often represented as a network, with tools and concepts derived from systems biology. The first investigation focuses on network sparsity and algorithmic biases introduced by penalised network inference procedures. Many contemporary network inference methods rely on a sparsity parameter such as the L1 penalty term used in the LASSO. However, a poor choice of the sparsity parameter can give highly incorrect network estimates. In order to avoid such poor choices, we devised a method to optimise the sparsity parameter, which maximises the accuracy of the inferred network. We showed that it is effective on in silico data sets with a reasonable level of informativeness and demonstrated that accurate prediction of network sparsity is key to elucidate the correct network parameters. The second investigation focuses on how knowledge from association networks can be transferred to regulatory network inference procedures. It is common that the quality of expression data is inadequate for reliable gene regulatory network inference. Therefore, we constructed an algorithm to incorporate prior knowledge and demonstrated that it increases the accuracy of network inference when the quality of the data is low. The third investigation aimed to understand the influence of system and data properties on network inference accuracy. L1 regularisation methods commonly produce poor network estimates when the data used for inference is ill-conditioned, even when the signal to noise ratio is so high that all links in the network can be proven to exist for the given significance. In this study we elucidated some general principles for under what conditions we expect strongly degraded accuracy. Moreover, it allowed us to estimate expected accuracy from conditions of simulated data, which was used to predict the performance of inference algorithms on biological data. Finally, we built a software package GeneSPIDER for solving problems encountered during previous investigations. The software package supports highly controllable network and data generation as well as data analysis and exploration in the context of network inference.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2015. , 42 p.
Keyword [en]
GRN, gene regulatory network, network inference, signal to noise ratio, model selection, variable selection, data properties, reverse engineering, ordinary differential equations, gene networks, linear regression, lasso
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-122149ISBN: 978-91-7649-299-4 (print)OAI: oai:DiVA.org:su-122149DiVA: diva2:865190
Public defence
2015-12-11, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:30 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Manuscript.

 

Available from: 2015-11-19 Created: 2015-10-27 Last updated: 2015-11-10Bibliographically approved
List of papers
1. Optimal Sparsity Criteria for Network Inference
Open this publication in new window or tab >>Optimal Sparsity Criteria for Network Inference
2013 (English)In: Journal of Computational Biology, ISSN 1066-5277, E-ISSN 1557-8666, Vol. 20, no 5, 398-408 p.Article in journal (Refereed) Published
Abstract [en]

Gene regulatory network inference (that is, determination of the regulatory interactions between a set of genes) provides mechanistic insights of central importance to research in systems biology. Most contemporary network inference methods rely on a sparsity/regularization coefficient, which we call zeta (zeta), to determine the degree of sparsity of the network estimates, that is, the total number of links between the nodes. However, they offer little or no advice on how to select this sparsity coefficient, in particular, for biological data with few samples. We show that an empty network is more accurate than estimates obtained for a poor choice of zeta. In order to avoid such poor choices, we propose a method for optimization of zeta, which maximizes the accuracy of the inferred network for any sparsity-dependent inference method and data set. Our procedure is based on leave-one-out cross-optimization and selection of the zeta value that minimizes the prediction error. We also illustrate the adverse effects of noise, few samples, and uninformative experiments on network inference as well as our method for optimization of zeta. We demonstrate that our zeta optimization method for two widely used inference algorithms-Glmnet and NIR-gives accurate and informative estimates of the network structure, given that the data is informative enough.

Keyword
algorithms, gene networks, linear algebra
National Category
Biochemistry and Molecular Biology Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-91295 (URN)10.1089/cmb.2012.0268 (DOI)000318854500004 ()
Note

AuthorCount:4;

Available from: 2013-06-27 Created: 2013-06-24 Last updated: 2017-12-06Bibliographically approved
2. Functional association networks as priors for gene regulatory network inference
Open this publication in new window or tab >>Functional association networks as priors for gene regulatory network inference
Show others...
2014 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 12, 130-138 p.Article in journal (Refereed) Published
Abstract [en]

Motivation: Gene regulatory network (GRN) inference reveals the influences genes have on one another in cellular regulatory systems. If the experimental data are inadequate for reliable inference of the network, informative priors have been shown to improve the accuracy of inferences. Results: This study explores the potential of undirected, confidence-weighted networks, such as those in functional association databases, as a prior source for GRN inference. Such networks often erroneously indicate symmetric interaction between genes and may contain mostly correlation-based interaction information. Despite these drawbacks, our testing on synthetic datasets indicates that even noisy priors reflect some causal information that can improve GRN inference accuracy. Our analysis on yeast data indicates that using the functional association databases FunCoup and STRING as priors can give a small improvement in GRN inference accuracy with biological data.

National Category
Biochemistry and Molecular Biology Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-106341 (URN)10.1093/bioinformatics/btu285 (DOI)000338109200016 ()
Note

AuthorCount:5;

Available from: 2014-08-08 Created: 2014-08-04 Last updated: 2017-12-05Bibliographically approved
3. Avoiding pitfalls in L-1-regularised inference of gene networks
Open this publication in new window or tab >>Avoiding pitfalls in L-1-regularised inference of gene networks
Show others...
2015 (English)In: Molecular Biosystems, ISSN 1742-206X, E-ISSN 1742-2051, Vol. 11, no 1, 287-296 p.Article in journal (Refereed) Published
Abstract [en]

Statistical regularisation methods such as LASSO and related L-1 regularised regression methods are commonly used to construct models of gene regulatory networks. Although they can theoretically infer the correct network structure, they have been shown in practice to make errors, i.e. leave out existing links and include non-existing links. We show that L-1 regularisation methods typically produce a poor network model when the analysed data are ill-conditioned, i.e. the gene expression data matrix has a high condition number, even if it contains enough information for correct network inference. However, the correct structure of network models can be obtained for informative data, data with such a signal to noise ratio that existing links can be proven to exist, when these methods fail, by using least-squares regression and setting small parameters to zero, or by using robust network inference, a recent method taking the intersection of all non-rejectable models. Since available experimental data sets are generally ill-conditioned, we recommend to check the condition number of the data matrix to avoid this pitfall of L-1 regularised inference, and to also consider alternative methods.

National Category
Biochemistry and Molecular Biology Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-113114 (URN)10.1039/c4mb00419a (DOI)000345897600028 ()
Note

AuthorCount:5;

Available from: 2015-02-23 Created: 2015-01-23 Last updated: 2017-12-04Bibliographically approved
4. GeneSPIDER - Generation and Simulation Package for Informative Data ExploRation
Open this publication in new window or tab >>GeneSPIDER - Generation and Simulation Package for Informative Data ExploRation
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

A range of tools are available to model, simulate and analyze gene regulatory networks (GRNs). However, these tools provide limited ability to control network topology, system dynamics, design of experiments, data properties, or noise characteristics. Independent control of these properties is the key to drawing conclusions on which inference method to use and what result to expect from it, as well as obtaining desired approximations of real biological systems. To draw conclusions on the relation between a network or data property and the performance of an inference method in simulations, system approximations with varying properties are needed. We present a Matlab package \gs for generation and analysis of networks and data in a dynamical systems framework with focus on the ability to vary properties. It supplies not only essential components that have been missing, but also wrappers to existing tools in common use. In particular, it contains tools for controlling and analyzing network topology (random, small-world, scale-free), stability of linear time-invariant systems, signal to noise ratio (SNR), and Interampatteness. It also contains methods for design of perturbation experiments, bootstrapping, analysis of linear dependence, sample selection, scaling of the SNR, and performance evaluation. GeneSPIDER offers control of network and data properties in simulations, together with tools to analyze these properties and draw conclusions on the quality of inferred GRNs. It can be fetched freely from the online =git= repository https://bitbucket.org/sonnhammergrni/genespider.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-122146 (URN)
Available from: 2015-10-27 Created: 2015-10-27 Last updated: 2016-10-27Bibliographically approved

Open Access in DiVA

Exploring the Boundaries of Gene Regulatory Network Inference(494 kB)252 downloads
File information
File name FULLTEXT02.pdfFile size 494 kBChecksum SHA-512
437c6228109dbe7b2343057573e6451fac1f0227c1209b6344513f0baa4a65da06488494a144723cead541bb75b2635dcf36ee5568c869c5e19fdcad3984c7b3
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Tjärnberg, Andreas
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 252 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1626 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf