Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Inference of functional association networks and gene orthology
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Most proteomics and genomics experiments are performed on a small set of well-studied model organisms and their results are generalized to other species. This is possible because all species are evolutionarily related. When transferring information across species, orthologs are the most likely candidates for functional equivalence. The InParanoid algorithm, which predicts orthology relations by sequence similarity based clustering, was improved by increasing its robustness for low complexity sequences and the corresponding database was updated to include more species.

A plethora of different orthology inference methods exist, each featuring different formats. We have addressed the great need for standardization this creates with the development of SeqXML and OrthoXML, two formats that standardize the input and output of ortholog inference.

Essentially all biological processes are the result of a complex interplay between different biomolecules. To fully understand the function of genes or gene products one needs to identify these relations. Integration of different types of high-throughput data allows the construction of genome-wide functional association networks that give a global picture of the relation landscape.

FunCoup is a framework that performs this integration to create functional association networks for 11 model organisms. Orthology assignments from InParanoid are used to transfer high-throughput data between species, which contributes with more than 50% to the total functional association evidence. We have developed procedures to incorporate new evidence types, improved the procedures of existing evidence types, created networks for additional species, and added significantly more data. Furthermore, the integration procedure was improved to account for data redundancy and to increase its overall robustness. Many of these changes were possible because the computational framework was re-implemented from scratch.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2013. , 83 p.
Keyword [en]
orthology, InParanoid, FunCoup, systems biology, biological networks, network inference, functional coupling, functional association
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
URN: urn:nbn:se:su:diva-92682ISBN: 978-91-7447-740-5 (print)OAI: oai:DiVA.org:su-92682DiVA: diva2:644640
Public defence
2013-10-04, Nordenskiöldsalen, Geovetenskapens hus, Svante Arrhenius väg 12, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Submitted.

Available from: 2013-09-12 Created: 2013-08-14 Last updated: 2017-08-25Bibliographically approved
List of papers
1. InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Open this publication in new window or tab >>InParanoid 7: new algorithms and tools for eukaryotic orthology analysis
Show others...
2010 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 38, no 1, D196-D203 p.Article in journal (Refereed) Published
Abstract [en]

The InParanoid project gathers proteomes of completely sequenced eukaryotic species plus Escherichia coli and calculates pairwise ortholog relationships among them. The new release 7.0 of the database has grown by an order of magnitude over the previous version and now includes 100 species and their collective 1.3 million proteins organized into 42.7 million pairwise ortholog groups. The InParanoid algorithm itself has been revised and is now both more specific and sensitive. Based on results from our recent benchmarking of low-complexity filters in homology assignment, a two-pass BLAST approach was developed that makes use of high-precision compositional score matrix adjustment, but avoids the alignment truncation that sometimes follows. We have also updated the InParanoid web site (http://InParanoid.sbc.su.se). Several features have been added, the response times have been improved and the site now sports a new, clearer look. As the number of ortholog databases has grown, it has become difficult to compare among these resources due to a lack of standardized source data and incompatible representations of ortholog relationships. To facilitate data exchange and comparisons among ortholog databases, we have developed and are making available two XML schemas: SeqXML for the input sequences and OrthoXML for the output ortholog clusters.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-34279 (URN)10.1093/nar/gkp931 (DOI)000276399100030 ()19892828 (PubMedID)
Available from: 2010-01-18 Created: 2010-01-07 Last updated: 2017-12-12Bibliographically approved
2. Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information
Open this publication in new window or tab >>Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information
2011 (English)In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 12, no 5, 485-488 p.Article in journal (Refereed) Published
Abstract [en]

There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e. g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

Keyword
OrthoXML, SeqXML, 'sequence format', 'orthology format', FASTA format, XML
National Category
Natural Sciences
Identifiers
urn:nbn:se:su:diva-67273 (URN)10.1093/bib/bbr025 (DOI)000295171700013 ()
Note
authorCount :4Available from: 2011-12-30 Created: 2011-12-27 Last updated: 2017-12-08Bibliographically approved
3. Comparative interactomics with Funcoup 2.0
Open this publication in new window or tab >>Comparative interactomics with Funcoup 2.0
Show others...
2012 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no D1, D821-D828 p.Article in journal (Refereed) Published
Abstract [en]

FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
Identifiers
urn:nbn:se:su:diva-76759 (URN)10.1093/nar/gkr1062 (DOI)000298601300123 ()
Note

AuthorCount; 6

Available from: 2013-04-11 Created: 2012-05-16 Last updated: 2017-09-29Bibliographically approved
4. FunCoup 3.0: database of genome-wide functional coupling networks
Open this publication in new window or tab >>FunCoup 3.0: database of genome-wide functional coupling networks
2014 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 42, no D1, D380-D388 p.Article in journal (Refereed) Published
Abstract [en]

We present an update of the FunCoup database (http://FunCoup.sbc.su.se) of functional couplings, or functional associations, between genes and gene products. Identifying these functional couplings is an important step in the understanding of higher level mechanisms performed by complex cellular processes. FunCoup distinguishes between four classes of couplings: participation in the same signaling cascade, participation in the same metabolic process, co-membership in a protein complex and physical interaction. For each of these four classes, several types of experimental and statistical evidence are combined by Bayesian integration to predict genome-wide functional coupling networks. The FunCoup framework has been completely re-implemented to allow for more frequent future updates. It contains many improvements, such as a regularization procedure to automatically downweight redundant evidences and a novel method to incorporate phylogenetic profile similarity. Several datasets have been updated and new data have been added in FunCoup 3.0. Furthermore, we have developed a new Web site, which provides powerful tools to explore the predicted networks and to retrieve detailed information about the data underlying each prediction.

National Category
Biochemistry and Molecular Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-102096 (URN)10.1093/nar/gkt984 (DOI)000331139800057 ()
Funder
Swedish Research Council
Note

AuthorCount:3;

Available from: 2014-03-26 Created: 2014-03-26 Last updated: 2017-09-06Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Schmitt, Thomas
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 113 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf