Change search
ReferencesLink to record
Permanent link

Direct link
MetaTM - a consensus method for transmembrane protein topology prediction
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. (Sonnhammer)
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. (Sonnhammer)
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. (Sonnhammer)
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. (Sonnhammer)
2009 (English)In: BMC Bioinformatics, ISSN 1471-2105, Vol. 10, 314- p.Article in journal (Refereed) Published
Abstract [en]

Transmembrane (TM) proteins are proteins that span a biological membrane one or more times. As their 3-D structures are hard to determine, experiments focus on identifying their topology (i. e. which parts of the amino acid sequence are buried in the membrane and which are located on either side of the membrane), but only a few topologies are known. Consequently, various computational TM topology predictors have been developed, but their accuracies are far from perfect. The prediction quality can be improved by applying a consensus approach, which combines results of several predictors to yield a more reliable result. RESULTS: A novel TM consensus method, named MetaTM, is proposed in this work. MetaTM is based on support vector machine models and combines the results of six TM topology predictors and two signal peptide predictors. On a large data set comprising 1460 sequences of TM proteins with known topologies and 2362 globular protein sequences it correctly predicts 86.7% of all topologies. CONCLUSION: Combining several TM predictors in a consensus prediction framework improves overall accuracy compared to any of the individual methods. Our proposed SVM-based system also has higher accuracy than a previous consensus predictor. MetaTM is made available both as downloadable source code and as DAS server at

Place, publisher, year, edition, pages
2009. Vol. 10, 314- p.
URN: urn:nbn:se:su:diva-34287DOI: 10.1186/1471-2105-10-314ISI: 000271119400001PubMedID: 19785723OAI: diva2:284478
Available from: 2010-01-18 Created: 2010-01-07 Last updated: 2012-04-10Bibliographically approved
In thesis
1. Biological data exchange and the discovery of new protein families in metagenomic samples
Open this publication in new window or tab >>Biological data exchange and the discovery of new protein families in metagenomic samples
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The rise in sequence data has brought both challenges to the way we exchange biological information and opportunities to discover new protein families, primarily through the investigation of uncultured metagenomic samples.The Distributed Annotation System, or DAS, provided a means for exchanging protein sequence data, but there were no open source, stand-alone DAS clients optimized for integrating and viewing these data. To address this need, we developed DASher. Complementary to visualizing DAS data with DASher, we also created and made available ten servers to offer real-time protein feature predictions via DAS. While DAS works well for genomic data, there was no such framework for exchanging orthology data in a consistent way. Consequently, we developed the first standards for orthology data exchange, SeqXML and OrthoXML. 64 reference proteomes are now available in SeqXML, and 14 orthology providers have agreed to offer their predictions in OrthoXML. Besides creating a uniform representation of common data types, these standards enable direct comparison and assessment of competing methods for the first time.A substantial percentage of newly sequenced genes are ORFans, which have no match to previously known sequences. Metagenomics samples uncover sequences from uncultivable and therefore previously unseen species, and ORFans constitute much of the metagenomics data that are completely uncharacterized. ORFans are by definition impervious to standard similarity-based methods, and the few existing metagenomics gene-finding methods performed poorly on short, error-prone next-generation sequence data. Therefore, we designed a new approach to predict protein-coding gene families from metagenomic data and applied it to 17 virally-enriched metagenomes derived from human patients. Of the 456 putative ORFan families we found in the nearly 1 billion nucleotides sequenced from these libraries, we identified 32 putative novel protein families with strong support.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2012. 124 p.
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry with Emphasis on Theoretical Chemistry
urn:nbn:se:su:diva-75108 (URN)978-91-74474-52-7 (ISBN)
Public defence
2012-05-11, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:30 (English)

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Manuscript.

Available from: 2012-04-19 Created: 2012-04-05 Last updated: 2012-04-11Bibliographically approved

Open Access in DiVA

MetaTM(264 kB)49 downloads
File information
File name FULLTEXT01.pdfFile size 264 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Other links

Publisher's full textPubMed

Search in DiVA

By author/editor
Messina, David N.Schmitt, ThomasSonnhammer, Erik L. L.
By organisation
Department of Biochemistry and Biophysics
In the same journal
BMC Bioinformatics

Search outside of DiVA

GoogleGoogle Scholar
Total: 49 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 60 hits
ReferencesLink to record
Permanent link

Direct link