Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
Show others and affiliations
2010 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 38, no 1, D196-D203 p.Article in journal (Refereed) Published
Abstract [en]

The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.

Place, publisher, year, edition, pages
2010. Vol. 38, no 1, D196-D203 p.
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
URN: urn:nbn:se:su:diva-34279DOI: 10.1093/nar/gkp931ISI: 000276399100030PubMedID: 19892828OAI: oai:DiVA.org:su-34279DiVA: diva2:284475
Available from: 2010-01-18 Created: 2010-01-07 Last updated: 2017-12-12Bibliographically approved
In thesis
1. The relationship between orthology, protein domain architecture and protein function
Open this publication in new window or tab >>The relationship between orthology, protein domain architecture and protein function
2011 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Lacking experimental data, protein function is often predicted from evolutionary and protein structure theory. Under the 'domain grammar' hypothesis the function of a protein follows from the domains it encodes. Under the 'orthology conjecture', orthologs, related through species formation, are expected to be more functionally similar than paralogs, which are homologs in the same or different species descended from a gene duplication event. However, these assumptions have not thus far been systematically evaluated.

To test the 'domain grammar' hypothesis, we built models for predicting function from the domain combinations present in a protein, and demonstrated that multi-domain combinations imply functions that the individual domains do not. We also developed a novel gene-tree based method for reconstructing the evolutionary histories of domain architectures, to search for cases of architectures that have arisen multiple times in parallel, and found this to be more common than previously reported.

To test the 'orthology conjecture', we first benchmarked methods for homology inference under the obfuscating influence of low-complexity regions, in order to improve the InParanoid orthology inference algorithm. InParanoid was then used to test the relative conservation of functionally relevant properties between orthologs and paralogs at various evolutionary distances, including intron positions, domain architectures, and Gene Ontology functional annotations.

We found an increased conservation of domain architectures in orthologs relative to paralogs, in support of the 'orthology conjecture' and the 'domain grammar' hypotheses acting in tandem. However, equivalent analysis of Gene Ontology functional conservation yielded spurious results, which may be an artifact of species-specific annotation biases in functional annotation databases. I discuss possible ways of circumventing this bias so the 'orthology conjecture' can be tested more conclusively.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2011. 112 p.
Keyword
homology, orthology, paralogy, gene duplications, protein function prediction, low-complexity regions, protein domains, domain architecture evolution, introns, intron position conservation, orthology conjecture, domain grammar hypothesis
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-62152 (URN)978-91-7447-350-6 (ISBN)
Public defence
2011-10-24, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 14:00 (English)
Opponent
Supervisors
Note
At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 6: Epub ahead of print.Available from: 2011-10-02 Created: 2011-09-09 Last updated: 2011-10-06Bibliographically approved
2. Data integration for robust network-based disease gene prediction
Open this publication in new window or tab >>Data integration for robust network-based disease gene prediction
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

For many complex diseases the cause/mechanism can be tied not to a single gene and in order to cope with the complexity a systems wide approach is needed. By combining evidence indicative of functional association it is possible to infer networks of protein functional coupling. The reliability of these networks is dependent on having sufficient data and on the data being informative.

By combining evidence from multiple species, functional coupling networks can reach higher coverage and accuracy. Genes in different species derived from the same gene by a speciation event are orthologous and likely to have a conserved function. In order to enable the transfer of information across species we inferred orthology with the InParanoid algorithm and made the inferences available to the public in the associated database.

Identification of genes involved in diseases is an important biomedical goal. Based on the "guilt by association" principle, we implemented an approach, Maxlink, for identifying and prioritizing novel disease genes. By searching the FunCoup network for genes functionally coupled to cancer genes we identified some 1800 novel cancer gene candidates showing characteristics of cancer genes.

While proteins are the active components, mRNA is often used as a proxy due to the difficulty of measuring protein abundance. We examined the relationship between mRNA and protein, using properties of expression profiles to identify subsets of genes with higher mRNA-protein concordance.

If technical and biological differences between patient/control studies of gene expression have a large impact, the results of studies of the same disease might be inconsistent. To determine this impact we examined the consistency in differential (co)expression between different studies of cancer, as well as non-cancer studies. Such consistency could generally be found, even between studies of different diseases, but only when common pitfalls of gene expression analysis are avoided.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2013. 71 p.
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-87962 (URN)978-91-7447-629-3 (ISBN)
Public defence
2013-04-12, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 5: Manuscript.

Available from: 2013-03-21 Created: 2013-02-27 Last updated: 2013-03-18Bibliographically approved
3. Inference of functional association networks and gene orthology
Open this publication in new window or tab >>Inference of functional association networks and gene orthology
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Most proteomics and genomics experiments are performed on a small set of well-studied model organisms and their results are generalized to other species. This is possible because all species are evolutionarily related. When transferring information across species, orthologs are the most likely candidates for functional equivalence. The InParanoid algorithm, which predicts orthology relations by sequence similarity based clustering, was improved by increasing its robustness for low complexity sequences and the corresponding database was updated to include more species.

A plethora of different orthology inference methods exist, each featuring different formats. We have addressed the great need for standardization this creates with the development of SeqXML and OrthoXML, two formats that standardize the input and output of ortholog inference.

Essentially all biological processes are the result of a complex interplay between different biomolecules. To fully understand the function of genes or gene products one needs to identify these relations. Integration of different types of high-throughput data allows the construction of genome-wide functional association networks that give a global picture of the relation landscape.

FunCoup is a framework that performs this integration to create functional association networks for 11 model organisms. Orthology assignments from InParanoid are used to transfer high-throughput data between species, which contributes with more than 50% to the total functional association evidence. We have developed procedures to incorporate new evidence types, improved the procedures of existing evidence types, created networks for additional species, and added significantly more data. Furthermore, the integration procedure was improved to account for data redundancy and to increase its overall robustness. Many of these changes were possible because the computational framework was re-implemented from scratch.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2013. 83 p.
Keyword
orthology, InParanoid, FunCoup, systems biology, biological networks, network inference, functional coupling, functional association
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-92682 (URN)978-91-7447-740-5 (ISBN)
Public defence
2013-10-04, Nordenskiöldsalen, Geovetenskapens hus, Svante Arrhenius väg 12, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Submitted.

Available from: 2013-09-12 Created: 2013-08-14 Last updated: 2017-08-25Bibliographically approved

Open Access in DiVA

Fulltext(5381 kB)115 downloads
File information
File name FULLTEXT02.pdfFile size 5381 kBChecksum SHA-512
a1b268f81336cdd646137e04d9ec489786b46ad46eea022e796d9a896d22b929fb71a1f20a92328971645573feacc5d1bf80dd68122a87c09f4c9fb203f0781d
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Östlund, GabrielSchmitt, ThomasForslund, KristofferMessina, David N.Roopra, SanjitFrings, OliverSonnhammer, Erik L. L.
By organisation
Department of Biochemistry and Biophysics
In the same journal
Nucleic Acids Research
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 141 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 161 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf