Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improved Contact Predictions Using the Recognition of Protein Like Contact Patterns
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Aalto University, Finland.
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). University Libre Brussels, Belgium.
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
2014 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 10, no 11, e1003889Article in journal (Refereed) Published
Abstract [en]

Given sufficient large protein families, and using a global statistical inference approach, it is possible to obtain sufficient accuracy in protein residue contact predictions to predict the structure of many proteins. However, these approaches do not consider the fact that the contacts in a protein are neither randomly, nor independently distributed, but actually follow precise rules governed by the structure of the protein and thus are interdependent. Here, we present PconsC2, a novel method that uses a deep learning approach to identify protein-like contact patterns to improve contact predictions. A substantial enhancement can be seen for all contacts independently on the number of aligned sequences, residue separation or secondary structure type, but is largest for b-sheet containing proteins. In addition to being superior to earlier methods based on statistical inferences, in comparison to state of the art methods using machine learning, PconsC2 is superior for families with more than 100 effective sequence homologs. The improved contact prediction enables improved structure prediction.

Place, publisher, year, edition, pages
2014. Vol. 10, no 11, e1003889
National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-111936DOI: 10.1371/journal.pcbi.1003889ISI: 000345454400003OAI: oai:DiVA.org:su-111936DiVA: diva2:777136
Funder
Swedish Research Council
Note

AuthorCount:4;

Available from: 2015-01-08 Created: 2015-01-08 Last updated: 2017-05-08Bibliographically approved
In thesis
1. From Sequence to Structure: Using predicted residue contacts to facilitate template-free protein structure prediction
Open this publication in new window or tab >>From Sequence to Structure: Using predicted residue contacts to facilitate template-free protein structure prediction
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Despite the fundamental role of experimental protein structure determination, computational methods are of essential importance to bridge the ever growing gap between available protein sequence and structure data. Common structure prediction methods rely on experimental data, which is not available for about half of the known protein families.

Recent advancements in amino acid contact prediction have revolutionized the field of protein structure prediction. Contacts can be used to guide template-free structure predictions that do not rely on experimentally solved structures of homologous proteins. Such methods are now able to produce accurate models for a wide range of protein families.

We developed PconsC2, an approach that improved existing contact prediction methods by recognizing intra-molecular contact patterns and noise reduction. An inherent problem of contact prediction based on maximum entropy models is that large alignments with over 1000 effective sequences are needed to infer contacts accurately. These are however not available for more than 80% of all protein families that do not have a representative structure in PDB. With PconsC3, we could extend the applicability of contact prediction to families as small as 100 effective sequences by combining global inference methods with machine learning based on local pairwise measures.

By introducing PconsFold, a pipeline for contact-based structure prediction, we could show that improvements in contact prediction accuracy translate to more accurate models. Finally, we applied a similar technique to Pfam, a comprehensive database of known protein families. In addition to using a faster folding protocol we employed model quality assessment methods, crucial for estimating the confidence in the accuracy of predicted models. We propose models tobe accurate for 558 families that do not have a representative known structure. Out of those, over 75% have not been reported before.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2017
Keyword
protein bioinformatics, protein structure prediction, contact prediction, machine learning
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-141946 (URN)978-91-7649-811-8 (ISBN)978-91-7649-812-5 (ISBN)
Public defence
2017-06-09, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 2: Submitted. Paper 4: In press.

 

Available from: 2017-05-17 Created: 2017-04-25 Last updated: 2017-05-10Bibliographically approved

Open Access in DiVA

fulltext(1537 kB)23 downloads
File information
File name FULLTEXT01.pdfFile size 1537 kBChecksum SHA-512
95112e7cc6fc80085a8f6ca1998031d9161684e5cad7f961cd7bcb23c495f55c5e15704c33f89732a906703532266893771b11fca79e1bde6214ae1b0e0e4916
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Michel, MircoElofsson, Arne
By organisation
Department of Biochemistry and BiophysicsScience for Life Laboratory (SciLifeLab)
In the same journal
PloS Computational Biology
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 23 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 60 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf