Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Fast and accurate gene regulatory network inference by normalized least squares regression
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0002-6362-0659
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0001-8284-356x
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0002-9015-5588
Antal upphovsmän: 52022 (Engelska)Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, nr 8, s. 2263-2268, artikel-id btac103Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Motivation: Inferring an accurate gene regulatory network (GRN) has long been a key goal in the field of systems biology. To do this, it is important to find a suitable balance between the maximum number of true positive and the minimum number of false-positive interactions. Another key feature is that the inference method can handle the large size of modern experimental data, meaning the method needs to be both fast and accurate. The Least Squares Cut-Off (LSCO) method can fulfill both these criteria, however as it is based on least squares it is vulnerable to known issues of amplifying extreme values, small or large. In GRN this manifests itself with genes that are erroneously hyper-connected to a large fraction of all genes due to extremely low value fold changes.

Results: We developed a GRN inference method called Least Squares Cut-Off with Normalization (LSCON) that tackles this problem. LSCON extends the LSCO algorithm by regularization to avoid hyper-connected genes and thereby reduce false positives. The regularization used is based on normalization, which removes effects of extreme values on the fit. We benchmarked LSCON and compared it to Genie3, LASSO, LSCO and Ridge regression, in terms of accuracy, speed and tendency to predict hyper-connected genes. The results show that LSCON achieves better or equal accuracy compared to LASSO, the best existing method, especially for data with extreme values. Thanks to the speed of least squares regression, LSCON does this an order of magnitude faster than LASSO.

Ort, förlag, år, upplaga, sidor
2022. Vol. 38, nr 8, s. 2263-2268, artikel-id btac103
Nationell ämneskategori
Biologiska vetenskaper Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-203209DOI: 10.1093/bioinformatics/btac103ISI: 000761598600001PubMedID: 35176145Scopus ID: 2-s2.0-85128723779OAI: oai:DiVA.org:su-203209DiVA, id: diva2:1647800
Tillgänglig från: 2022-03-28 Skapad: 2022-03-28 Senast uppdaterad: 2023-09-14Bibliografiskt granskad
Ingår i avhandling
1. In silico modelling for refining gene regulatory network inference
Öppna denna publikation i ny flik eller fönster >>In silico modelling for refining gene regulatory network inference
2023 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Gene regulation is at the centre of all cellular functions, regulating the cell's healthy and pathological responses. The interconnected system of regulatory interactions is known as the gene regulatory network (GRN), where genes influence each other to maintain strict and robust control. Today a large number of methods exist for inferring GRNs, which necessitates benchmarking to determine which method is most suitable for a specific goal. Paper I presents such a benchmark focusing on the effect of using known perturbations to infer GRNs. 

A further challenge when studying GRNs is that experimental data contains high levels of noise and that artefacts may be introduced by the experiment itself. The LSCON method was developed in paper II to reduce the effect of one such artefact that can occur if the expression of a gene shows no or minimal change across most or all experiments. 

 With few fully determined biological GRNs available, it is problematic to use these to evaluate an inference method's correctness. Instead, the GRN field relies on simulated data, using a known GRN and generating the corresponding data. When simulating GRNs, capturing the topological properties of the biological GRN is vital. The FFLatt algorithm was developed in paper III to create scale-free, feed-forward loop motif-enriched GRNs, capturing two of the most prominent topological features in biological GRNs. 

 Once a high-quality GRN is obtained, the next step is to simulate gene expression data corresponding to the GRN. In paper IV, building on the FFLatt method, an open-source Python simulation tool called GeneSNAKE was developed to generate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties and improves on previous tools by featuring a variety of perturbation schemes along with the ability to control noise and modify the perturbation strength.

Ort, förlag, år, upplaga, sidor
Stockohlm: Department of Biochemistry and Biophysics, Stockholm University, 2023. s. 49
Nyckelord
Gene regulatory networks, simulation, benchmarking, method development
Nationell ämneskategori
Bioinformatik och beräkningsbiologi
Forskningsämne
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-221155 (URN)978-91-8014-504-6 (ISBN)978-91-8014-505-3 (ISBN)
Disputation
2023-10-27, Air and Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2023-10-04 Skapad: 2023-09-14 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Hillerton, ThomasSeçilmiş, DenizSonnhammer, Erik L. L.

Sök vidare i DiVA

Av författaren/redaktören
Hillerton, ThomasSeçilmiş, DenizSonnhammer, Erik L. L.
Av organisationen
Institutionen för biokemi och biofysikScience for Life Laboratory (SciLifeLab)
I samma tidskrift
Bioinformatics
Biologiska vetenskaperData- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 76 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf