Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families
Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.ORCID iD: 0000-0001-7161-9028
Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.ORCID iD: 0000-0002-7115-9751
Number of Authors: 22021 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 17, no 4, article id e1008798Article in journal (Refereed) Published
Abstract [en]

Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy. Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein's structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.

Place, publisher, year, edition, pages
2021. Vol. 17, no 4, article id e1008798
National Category
Biological Sciences
Identifiers
URN: urn:nbn:se:su:diva-194356DOI: 10.1371/journal.pcbi.1008798ISI: 000640608100002PubMedID: 33857128OAI: oai:DiVA.org:su-194356DiVA, id: diva2:1570380
Available from: 2021-06-21 Created: 2021-06-21 Last updated: 2022-02-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMed

Authority records

Bassot, ClaudioElofsson, Arne

Search in DiVA

By author/editor
Bassot, ClaudioElofsson, Arne
By organisation
Science for Life Laboratory (SciLifeLab)Department of Biochemistry and Biophysics
In the same journal
PloS Computational Biology
Biological Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 220 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf