Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Domainoid: Domain-oriented orthology inference
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Motivation: Orthology inference is normally based on full length protein sequences. However, most proteins contain independently folding and recurring regions termed domains. The domain architecture of a protein is vital to its function, and recombination events mean individual domains can have different evolutionary histories. It has previously been shown that orthologous proteins may differ in domain architecture, creating challenges for orthology inference methods operating on full-length sequences. For these reasons it becomes interesting to investigate orthology inference on the domain level.

Results: We have developed and evaluated an algorithm for domain-based orthology inference called Domainoid. It employs the InParanoid algorithm on single domains separately, then calls protein-level orthology based on the fraction of domains that are orthologous. Domainoid orthology assignments were compared to those yielded by the full length approach, and were validated in a standard benchmark. We present examples of orthologs that are found by our domain based approach but not by orthology prediction methods that rely on full-length protein sequences. Our results suggest that domain based orthology inference is a useful extension to full length sequence approaches for identifying orthologs.

National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-155094OAI: oai:DiVA.org:su-155094DiVA, id: diva2:1196893
Available from: 2018-04-11 Created: 2018-04-11 Last updated: 2018-04-27Bibliographically approved
In thesis
1. Functional Inference from Orthology and Domain Architecture
Open this publication in new window or tab >>Functional Inference from Orthology and Domain Architecture
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are the basic building blocks of all living organisms. They play a central role in determining the structure of living beings and are required for essential chemical reactions. One of the main challenges in bioinformatics is to characterize the function of all proteins. The problem of understanding protein function can be approached by understanding their evolutionary history. Orthology analysis plays an important role in studying the evolutionary relation of proteins. Proteins are termed orthologs if they derive from a single gene in the species' last common ancestor, i.e. if they were separated by a speciation event. Orthologs are useful because they retain their function more often than other homologs. 

Inference of a complete set of orthologs for many species is computationally intensive. Currently, the fastest algorithms rely on graph-based approaches, which compare all-vs-all sequences and then cluster top hits into groups of orthologs. The initial step of performing all-vs-all comparisons is usually the primary computational challenge as it scales quadratically with the number of species. 

A new, more scalable and less computationally demanding method was developed to solve this problem without sacrificing accuracy. The Hieranoid 2 algorithm reduces computational complexity to almost linear by overcoming the necessity to perform all-vs-all similarity searches. The algorithm progresses along a known species tree, from leaves to root. Starting at the leaves, ortholog groups are predicted conventionally and then summarized at internal nodes to form pseudo-species. These pseudo-species are then re-used to search against other (pseudo-)species higher in the tree. This way the algorithm aggregates new ortholog groups hierarchically. The hierarchy is a natural structure to store and view large multi-species ortholog groups, and provides a complete picture of inferred evolutionary events. 

To facilitate explorative analysis of hierarchical groups of orthologs, a new online tool was created. The HieranoiDB website provides precomputed hierarchical groups of orthologs for a set of 66 species. It allows the user to search for orthology assignments using protein description, protein sequence, or species. Evolutionary events and meta information is added to the hierarchical groups of orthologs, which are shown graphically as interactive trees. This representation allows exploring, searching, and easier visual inspection of multi-species ortholog groups.

The majority of orthology prediction methods focus on treating the whole protein sequence as a single evolutionary unit. However, proteins are often composed of individual units, called protein domains, that can have different evolutionary histories. To extend the full sequence based methodology to a domain-aware method, a new approach called Domainoid is proposed. Here, domains are extracted from full-length sequences and subjected to orthology inference. This allows Domainoid to find orthology that would be missed by a full sequence approach.

Networks are a convenient graphical representation for showing a large number of functional associations between genes or proteins. They allow various analyses of graph properties, and can help visualize complex relationships. A framework for inferring comprehensive functional association networks was developed, called FunCoup. A major difference compared to other networks is FunCoup's extensive use of orthology relationships between species, which significantly boosts its coverage. Using naïve Bayesian classifiers to integrate 10 different evidence types and orthology transfer, FunCoup captures functional associations of many types, and provides comprehensive networks for 17 species across five gold-standards.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2018. p. 38
Keywords
Orthology, Functional coupling networks, Association networks, Hierarchical groups of orthologs
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-155096 (URN)978-91-7797-252-5 (ISBN)978-91-7797-253-2 (ISBN)
Public defence
2018-06-12, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 3: Manuscript.

Available from: 2018-05-18 Created: 2018-04-24 Last updated: 2018-05-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Kaduk, MateuszSonnhammer, Erik
By organisation
Department of Biochemistry and BiophysicsScience for Life Laboratory (SciLifeLab)
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 208 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf