Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Letter to the Editor: SeqXML and OrthoXML: standards for sequence and orthology information
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
2011 (Engelska)Ingår i: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 12, nr 5, s. 485-488Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

There is a great need for standards in the orthology field. Users must contend with different ortholog data representations from each provider, and the providers themselves must independently gather and parse the input sequence data. These burdensome and redundant procedures make data comparison and integration difficult. We have designed two XML-based formats, SeqXML and OrthoXML, to solve these problems. SeqXML is a lightweight format for sequence records the input for orthology prediction. It stores the same sequence and metadata as typical FASTA format records, but overcomes common problems such as unstructured metadata in the header and erroneous sequence content. XML provides validation to prevent data integrity problems that are frequent in FASTA files. The range of applications for SeqXML is broad and not limited to ortholog prediction. We provide read/write functions for BioJava, BioPerl, and Biopython. OrthoXML was designed to represent ortholog assignments from any source in a consistent and structured way, yet cater to specific needs such as scoring schemes or meta-information. A unified format is particularly valuable for ortholog consumers that want to integrate data from numerous resources, e. g. for gene annotation projects. Reference proteomes for 61 organisms are already available in SeqXML, and 10 orthology databases have signed on to OrthoXML. Adoption by the entire field would substantially facilitate exchange and quality control of sequence and orthology information.

Ort, förlag, år, upplaga, sidor
2011. Vol. 12, nr 5, s. 485-488
Nyckelord [en]
OrthoXML, SeqXML, 'sequence format', 'orthology format', FASTA format, XML
Nationell ämneskategori
Naturvetenskap
Identifikatorer
URN: urn:nbn:se:su:diva-67273DOI: 10.1093/bib/bbr025ISI: 000295171700013OAI: oai:DiVA.org:su-67273DiVA, id: diva2:470944
Anmärkning
authorCount :4Tillgänglig från: 2011-12-30 Skapad: 2011-12-27 Senast uppdaterad: 2022-02-24Bibliografiskt granskad
Ingår i avhandling
1. Biological data exchange and the discovery of new protein families in metagenomic samples
Öppna denna publikation i ny flik eller fönster >>Biological data exchange and the discovery of new protein families in metagenomic samples
2012 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The rise in sequence data has brought both challenges to the way we exchange biological information and opportunities to discover new protein families, primarily through the investigation of uncultured metagenomic samples.The Distributed Annotation System, or DAS, provided a means for exchanging protein sequence data, but there were no open source, stand-alone DAS clients optimized for integrating and viewing these data. To address this need, we developed DASher. Complementary to visualizing DAS data with DASher, we also created and made available ten servers to offer real-time protein feature predictions via DAS. While DAS works well for genomic data, there was no such framework for exchanging orthology data in a consistent way. Consequently, we developed the first standards for orthology data exchange, SeqXML and OrthoXML. 64 reference proteomes are now available in SeqXML, and 14 orthology providers have agreed to offer their predictions in OrthoXML. Besides creating a uniform representation of common data types, these standards enable direct comparison and assessment of competing methods for the first time.A substantial percentage of newly sequenced genes are ORFans, which have no match to previously known sequences. Metagenomics samples uncover sequences from uncultivable and therefore previously unseen species, and ORFans constitute much of the metagenomics data that are completely uncharacterized. ORFans are by definition impervious to standard similarity-based methods, and the few existing metagenomics gene-finding methods performed poorly on short, error-prone next-generation sequence data. Therefore, we designed a new approach to predict protein-coding gene families from metagenomic data and applied it to 17 virally-enriched metagenomes derived from human patients. Of the 456 putative ORFan families we found in the nearly 1 billion nucleotides sequenced from these libraries, we identified 32 putative novel protein families with strong support.

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2012. s. 124
Nationell ämneskategori
Bioinformatik och systembiologi
Forskningsämne
biokemi, inriktning teoretisk kemi
Identifikatorer
urn:nbn:se:su:diva-75108 (URN)978-91-74474-52-7 (ISBN)
Disputation
2012-05-11, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:30 (Engelska)
Opponent
Handledare
Anmärkning

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Manuscript.

Tillgänglig från: 2012-04-19 Skapad: 2012-04-05 Senast uppdaterad: 2022-02-24Bibliografiskt granskad
2. Inference of functional association networks and gene orthology
Öppna denna publikation i ny flik eller fönster >>Inference of functional association networks and gene orthology
2013 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Most proteomics and genomics experiments are performed on a small set of well-studied model organisms and their results are generalized to other species. This is possible because all species are evolutionarily related. When transferring information across species, orthologs are the most likely candidates for functional equivalence. The InParanoid algorithm, which predicts orthology relations by sequence similarity based clustering, was improved by increasing its robustness for low complexity sequences and the corresponding database was updated to include more species.

A plethora of different orthology inference methods exist, each featuring different formats. We have addressed the great need for standardization this creates with the development of SeqXML and OrthoXML, two formats that standardize the input and output of ortholog inference.

Essentially all biological processes are the result of a complex interplay between different biomolecules. To fully understand the function of genes or gene products one needs to identify these relations. Integration of different types of high-throughput data allows the construction of genome-wide functional association networks that give a global picture of the relation landscape.

FunCoup is a framework that performs this integration to create functional association networks for 11 model organisms. Orthology assignments from InParanoid are used to transfer high-throughput data between species, which contributes with more than 50% to the total functional association evidence. We have developed procedures to incorporate new evidence types, improved the procedures of existing evidence types, created networks for additional species, and added significantly more data. Furthermore, the integration procedure was improved to account for data redundancy and to increase its overall robustness. Many of these changes were possible because the computational framework was re-implemented from scratch.

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2013. s. 83
Nyckelord
orthology, InParanoid, FunCoup, systems biology, biological networks, network inference, functional coupling, functional association
Nationell ämneskategori
Bioinformatik och systembiologi
Forskningsämne
biokemi, inriktning teoretisk kemi
Identifikatorer
urn:nbn:se:su:diva-92682 (URN)978-91-7447-740-5 (ISBN)
Disputation
2013-10-04, Nordenskiöldsalen, Geovetenskapens hus, Svante Arrhenius väg 12, Stockholm, 10:00 (Engelska)
Opponent
Handledare
Anmärkning

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Submitted.

Tillgänglig från: 2013-09-12 Skapad: 2013-08-14 Senast uppdaterad: 2022-02-24Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltext

Person

Schmitt, ThomasMessina, David N.Schreiber, FabianSonnhammer, Erik L. L.

Sök vidare i DiVA

Av författaren/redaktören
Schmitt, ThomasMessina, David N.Schreiber, FabianSonnhammer, Erik L. L.
Av organisationen
Institutionen för biokemi och biofysik
I samma tidskrift
Briefings in Bioinformatics
Naturvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 111 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf