34567896 of 31
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Big data networks and orthology analysis
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.ORCID iD: 0000-0003-0532-8251
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Understanding biological systems in complex organisms is important in life science in order to comprehend the interplay of genes, proteins, and compounds causing complex diseases. As biological systems are intricate, bioinformatics tools, models, and algorithms are of the utmost importance to understand the bigger picture and decipher biological meaning from the vast amounts of information available from biological experiments and predictions. Bioinformatics programs and algorithms do not only depend on information from experiments, but also on information generated from other tools in order to draw accurate conclusions and make predictions. 

Prediction of orthologs, genes having a common ancestry, separated by a speciation event, are important building blocks for a wide variety of tools and analysis pipelines, as they can be used to transfer gene function between species. Orthologs can for example be used to map genes of model organisms to genes in humans in studies of drug targets. They are extensively used in functional association networks in order to transfer information between species. Functional association networks are models of associations between genes or proteins, where associations can be derived from experimental evidence of different types, from the species itself, or transferred from other species using orthologs. The networks can be used to explore the context and neighbors of a gene, but also for a variety of higher-level analyses, e.g. network-based pathway enrichment analysis. In pathway enrichment analysis the networks can be utilized to contextualize experimental gene sets and annotate them with biological functions. As these tools depend on each other, it is of great importance that the networks used in pathway enrichment analysis are comprehensive and accurate, and that the orthologs used in the networks are relevant and significant. 

In this thesis, the development and improvement of five bioinformatics tools within three areas of bioinformatics are presented. Despite the tools residing within slightly different areas, they all rely on each other, and can all on different levels improve our understanding of biological functions and biological meaning, from the level of orthology analysis to functional association networks to pathway enrichment analysis.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2023. , p. 67
Keywords [en]
Ortholog, protein domain, functional association network, pathway enrichment analysis
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-222146ISBN: 978-91-8014-548-0 (print)ISBN: 978-91-8014-549-7 (electronic)OAI: oai:DiVA.org:su-222146DiVA, id: diva2:1805130
Public defence
2023-12-01, Air & Fire, SciLifeLab, Tomtebodavägen 23A, and online via Zoom, public link is available at the department website, Solna, 15:00 (English)
Opponent
Supervisors
Available from: 2023-11-08 Created: 2023-10-16 Last updated: 2023-10-27Bibliographically approved
List of papers
1. Domainoid: domain-oriented orthology inference
Open this publication in new window or tab >>Domainoid: domain-oriented orthology inference
2019 (English)In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 20, no 1, article id 523Article in journal (Refereed) Published
Abstract [en]

Background: Orthology inference is normally based on full-length protein sequences. However, most proteins contain independently folding and recurring regions, domains. The domain architecture of a protein is vital for its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. We have developed Domainoid, a new tool aiming to overcome these challenges faced by full-length orthology methods by inferring orthology on the domain level. It employs the InParanoid algorithm on single domains separately, to infer groups of orthologous domains.

Results: This domain-oriented approach allows detection of discordant domain orthologs, cases where different domains on the same protein have different evolutionary histories. In addition to domain level analysis, protein level orthology based on the fraction of domains that are orthologous can be inferred. Domainoid orthology assignments were compared to those yielded by the conventional full-length approach InParanoid, and were validated in a standard benchmark.

Conclusions: Our results show that domain-based orthology inference can reveal many orthologous relationships that are not found by full-length sequence approaches.

Keywords
Orthology, Domain ortholog, Protein domain
National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-177520 (URN)10.1186/s12859-019-3137-2 (DOI)000502350400001 ()31660857 (PubMedID)
Available from: 2020-01-08 Created: 2020-01-08 Last updated: 2023-10-16Bibliographically approved
2. InParanoid-DIAMOND: faster orthology analysis with the InParanoid algorithm
Open this publication in new window or tab >>InParanoid-DIAMOND: faster orthology analysis with the InParanoid algorithm
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 10, p. 2918-2919Article in journal (Refereed) Published
Abstract [en]

Predicting orthologs, genes in different species having shared ancestry, is an important task in bioinformatics. Orthology prediction tools are required to make accurate and fast predictions, in order to analyze large amounts of data within a feasible time frame. InParanoid is a well-known algorithm for orthology analysis, shown to perform well in benchmarks, but having the major limitation of long runtimes on large datasets. Here, we present an update to the InParanoid algorithm that can use the faster tool DIAMOND instead of BLAST for the homolog search step. We show that it reduces the runtime by 94%, while still obtaining similar performance in the Quest for Orthologs benchmark. 

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-204487 (URN)10.1093/bioinformatics/btac194 (DOI)000785761400001 ()35357425 (PubMedID)2-s2.0-85132369777 (Scopus ID)
Available from: 2022-05-09 Created: 2022-05-09 Last updated: 2023-10-16Bibliographically approved
3. InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins
Open this publication in new window or tab >>InParanoiDB 9: Ortholog Groups for Protein Domains and Full-Length Proteins
2023 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 435, no 14, article id 168001Article in journal (Refereed) Published
Abstract [en]

Prediction of orthologs is an important bioinformatics pursuit that is frequently used for inferring protein function and evolutionary analyses. The InParanoid database is a well known resource of ortholog predictions between a wide variety of organisms. Although orthologs have historically been inferred at the level of full-length protein sequences, many proteins consist of several independent protein domains that may be orthologous to domains in other proteins in a way that differs from the full-length protein case. To be able to capture all types of orthologous relations, conventional full-length protein orthologs can be complemented with orthologs inferred at the domain level. We here present InParanoiDB 9, covering 640 species and providing orthologs for both protein domains and full-length proteins. InParanoiDB 9 was built using the faster InParanoid-DIAMOND algorithm for orthology analysis, as well as Domainoid and Pfam to infer orthologous domains. InParanoiDB 9 is based on proteomes from 447 eukaryotes, 158 bacteria and 35 archaea, and includes over one billion predicted ortholog groups. A new website has been built for the database, providing multiple search options as well as visualization of groups of orthologs and orthologous domains. This release constitutes a major upgrade of the InParanoid database in terms of the number of species as well as the new capability to operate on the domain level. InParanoiDB 9 is available at https://inparanoidb.sbc.su.se/.

Keywords
ortholog, InParanoid, orthologous domain, protein domain, ortholog database
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:su:diva-220951 (URN)10.1016/j.jmb.2023.168001 (DOI)001054111000001 ()36764355 (PubMedID)2-s2.0-85148362111 (Scopus ID)
Available from: 2023-09-15 Created: 2023-09-15 Last updated: 2023-10-16Bibliographically approved
4. FunCoup 5: Functional Association Networks in All Domains of Life, Supporting Directed Links and Tissue-Specificity
Open this publication in new window or tab >>FunCoup 5: Functional Association Networks in All Domains of Life, Supporting Directed Links and Tissue-Specificity
Show others...
2021 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 433, article id 166835Article in journal (Refereed) Published
Abstract [en]

FunCoup (https://funcoup.sbc.su.se) is one of the most comprehensive functional association networks of genes/proteins available. Functional associations are inferred by integrating different types of evidence using a redundancy-weighted naïve Bayesian approach, combined with orthology transfer. FunCoup's high coverage comes from using eleven different types of evidence, and extensive transfer of information between species. Since the latest update of the database, the availability of source data has improved drastically, and user expectations on a tool for functional associations have grown. To meet these requirements, we have made a new release of FunCoup with updated source data and improved functionality. FunCoup 5 now includes 22 species from all domains of life, and the source data for evidences, gold standards, and genomes have been updated to the latest available versions. In this new release, directed regulatory links inferred from transcription factor binding can be visualized in the network viewer for the human interactome. Another new feature is the possibility to filter by genes expressed in a certain tissue in the network viewer. FunCoup 5 further includes the SARS-CoV-2 proteome, allowing users to visualize and analyze interactions between SARS-CoV-2 and human proteins in order to better understand COVID-19. This new release of FunCoup constitutes a major advance for the users, with updated sources, new species and improved functionality for analysis of the networks.

Keywords
Bayesian integration; SARS-CoV-2; functional association network; gene regulatory network; protein network; tissue-specific network.
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-195046 (URN)10.1016/j.jmb.2021.166835 (DOI)000648520800016 ()
Available from: 2021-08-02 Created: 2021-08-02 Last updated: 2023-10-16Bibliographically approved
5. PathBIX—a web server for network-based pathway annotation with adaptive null models
Open this publication in new window or tab >>PathBIX—a web server for network-based pathway annotation with adaptive null models
2021 (English)In: Bioinformatics Advances, E-ISSN 2635-0041, Vol. 1, no 1, article id vbab010Article in journal (Refereed) Published
Abstract [en]

Motivation: Pathway annotation is a vital tool for interpreting and giving meaning to experimental data in life sciences. Numerous tools exist for this task, where the most recent generation of pathway enrichment analysis tools, network-based methods, utilize biological networks to gain a richer source of information as a basis of the analysis than merely the gene content. Network-based methods use the network crosstalk between the query gene set and the genes in known pathways, and compare this to a null model of random expectation.

Results: We developed PathBIX, a novel web application for network-based pathway analysis, based on the recently published ANUBIX algorithm which has been shown to be more accurate than previous network-based methods. The PathBIX website performs pathway annotation for 21 species, and utilizes prefetched and preprocessed network data from FunCoup 5.0 networks and pathway data from three databases: KEGG, Reactome, and WikiPathways.

National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:su:diva-195030 (URN)10.1093/bioadv/vbab010 (DOI)
Available from: 2021-08-02 Created: 2021-08-02 Last updated: 2023-10-16Bibliographically approved

Open Access in DiVA

Big data networks and orthology analysis(9000 kB)30 downloads
File information
File name FULLTEXT01.pdfFile size 9000 kBChecksum SHA-512
62e405e148c75dd6156468b9dfe8bf87eb4c98afaa0191a3f870ea26a0bc3071c16cb89c53e03eca85df197d8efb2e262be7fc6256e2d035537264851d80297b
Type fulltextMimetype application/pdf

Authority records

Persson, Emma

Search in DiVA

By author/editor
Persson, Emma
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 30 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 125 hits
34567896 of 31
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf