Change search
Link to record
Permanent link

Direct link
Publications (8 of 8) Show all publications
Quaglia, F., Mészáros, B., Salladini, E., Hatos, A., Pancsa, R., Chemes, L. B., . . . Piovesan, D. (2022). DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation. Nucleic Acids Research, 50(D1), D480-D487
Open this publication in new window or tab >>DisProt in 2022: improved quality and accessibility of protein intrinsic disorder annotation
Show others...
2022 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 50, no D1, p. D480-D487Article in journal (Refereed) Published
Abstract [en]

The Database of Intrinsically Disordered Proteins (DisProt, URL: https://disprot.org) is the major repository of manually curated annotations of intrinsically disordered proteins and regions from the literature. We report here recent updates of DisProt version 9, including a restyled web interface, refactored Intrinsically Disordered Proteins Ontology (IDPO), improvements in the curation process and significant content growth of around 30%. Higher quality and consistency of annotations is provided by a newly implemented reviewing process and training of curators. The increased curation capacity is fostered by the integration of DisProt with APICURON, a dedicated resource for the proper attribution and recognition of biocuration efforts. Better interoperability is provided through the adoption of the Minimum Information About Disorder (MIADE) standard, an active collaboration with the Gene Ontology (GO) and Evidence and Conclusion Ontology (ECO) consortia and the support of the ELIXIR infrastructure.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-201891 (URN)10.1093/nar/gkab1082 (DOI)000743496700059 ()34850135 (PubMedID)2-s2.0-85125157608 (Scopus ID)
Available from: 2022-02-10 Created: 2022-02-10 Last updated: 2022-10-07Bibliographically approved
Pozzati, G., Zhu, W., Bassot, C., Lamb, J., Kundrotas, P. & Elofsson, A. (2022). Limits and potential of combined folding and docking. Bioinformatics, 38(4), 954-961
Open this publication in new window or tab >>Limits and potential of combined folding and docking
Show others...
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 4, p. 954-961Article in journal (Refereed) Published
Abstract [en]

Motivation: In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilising deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSAs). The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein-protein interfaces. However, most earlier studies have not used the latest DL methods for inter-chain contact distance prediction. This article introduces a fold-and-dock method based on predicted residue-residue distances with trRosetta.

Results: The method can simultaneously predict the tertiary and quaternary structure of a protein pair, even when the structures of the monomers are not known. The straightforward application of this method to a standard dataset for protein-protein docking yielded limited success. However, using alternative methods for generating MSAs allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods. Moreover, the results of conventional and fold-and-dock approaches are complementary, and thus a combined docking pipeline could increase overall docking success significantly. This methodology contributed to the best model for one of the CASP14 oligomeric targets, H1065.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-202237 (URN)10.1093/bioinformatics/btab760 (DOI)000747962400010 ()34788800 (PubMedID)
Available from: 2022-02-23 Created: 2022-02-23 Last updated: 2023-08-11Bibliographically approved
Bassot, C. & Elofsson, A. (2021). Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families. PloS Computational Biology, 17(4), Article ID e1008798.
Open this publication in new window or tab >>Accurate contact-based modelling of repeat proteins predicts the structure of new repeats protein families
2021 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 17, no 4, article id e1008798Article in journal (Refereed) Published
Abstract [en]

Repeat proteins are widespread among organisms and particularly abundant in eukaryotic proteomes. Their primary sequence presents repetition in the amino acid sequences that origin structures with repeated folds/domains. Although the repeated units often can be recognised from the sequence alone, often structural information is missing. Here, we used contact prediction for predicting the structure of repeats protein directly from their primary sequences. We benchmark the methods on a dataset comprehensive of all the known repeated structures. We evaluate the contact predictions and the obtained models for different classes of repeat proteins. Further, we develop and benchmark a quality assessment (QA) method specific for repeat proteins. Finally, we used the prediction pipeline for all PFAM repeat families without resolved structures and found that forty-one of them could be modelled with high accuracy. Repeat proteins are abundant in eukaryotic proteomes. They are involved in many eukaryotic specific functions, including signalling. For many of these proteins, the structure is not known, as they are difficult to crystallise. Today, using direct coupling analysis and deep learning it is often possible to predict a protein's structure. However, the unique sequence features present in repeat proteins have been a challenge to use direct coupling analysis for predicting contacts. Here, we show that deep learning-based methods (trRosetta, DeepMetaPsicov (DMP) and PconsC4) overcomes this problem and can predict intra- and inter-unit contacts in repeat proteins. In a benchmark dataset of 815 repeat proteins, about 90% can be correctly modelled. Further, among 48 PFAM families lacking a protein structure, we produce models of forty-one families with estimated high accuracy.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-194356 (URN)10.1371/journal.pcbi.1008798 (DOI)000640608100002 ()33857128 (PubMedID)
Available from: 2021-06-21 Created: 2021-06-21 Last updated: 2022-02-25Bibliographically approved
Sudha, G., Bassot, C., Lamb, J., Shu, N., Huang, Y. & Elofsson, A. (2021). The evolutionary history of topological variations in the CPA/AT transporters. PloS Computational Biology, 17(8), Article ID e1009278.
Open this publication in new window or tab >>The evolutionary history of topological variations in the CPA/AT transporters
Show others...
2021 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 17, no 8, article id e1009278Article in journal (Refereed) Published
Abstract [en]

CPA/AT transporters are made up of scaffold and a core domain. The core domain contains two non-canonical helices (broken or reentrant) that mediate the transport of ions, amino acids or other charged compounds. During evolution, these transporters have undergone substantial changes in structure, topology and function. To shed light on these structural transitions, we create models for all families using an integrated topology annotation method. We find that the CPA/AT transporters can be classified into four fold-types based on their structure; (1) the CPA-broken fold-type, (2) the CPA-reentrant fold-type, (3) the BART fold-type, and (4) a previously not described fold-type, the Reentrant-Helix-Reentrant fold-type. Several topological transitions are identified, including the transition between a broken and reentrant helix, one transition between a loop and a reentrant helix, complete changes of orientation, and changes in the number of scaffold helices. These transitions are mainly caused by gene duplication and shuffling events. Structural models, topology information and other details are presented in a searchable database, CPAfold (cpafold.bioinfo.se). Author summary The availability of experimentally solved transmembrane transport structures are sparse, and modelling is challenging as the families contain non-canonical transmembrane helices. Here, we present structural models for all families of CPA/AT transporters. These proteins are then classified into four fold-types, including one novel fold-type, the reentrant-helix-reentrant fold type. We find extensive structural variations within the fold with members having from three to fourteen transmembrane helices. We explore the evolutionary mechanisms that have shaped the topological variations providing a deeper understanding of membrane protein structure and evolution. We also believe our work could serve as a model system to understand the evolution of topology variations for other membrane proteins.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-197677 (URN)10.1371/journal.pcbi.1009278 (DOI)000685776300001 ()34403419 (PubMedID)
Available from: 2021-10-14 Created: 2021-10-14 Last updated: 2022-02-25Bibliographically approved
Hatos, A., Hajdu-Soltesz, B., Monzon, A. M., Palopoli, N., Alvarez, L., Aykac-Fas, B., . . . Piovesan, D. (2020). DisProt: intrinsic protein disorder annotation in 2020. Nucleic Acids Research, 48(D1), D269-D276
Open this publication in new window or tab >>DisProt: intrinsic protein disorder annotation in 2020
Show others...
2020 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 48, no D1, p. D269-D276Article in journal (Refereed) Published
Abstract [en]

The Database of Protein Disorder (DisProt, URL:https://disprot.org) provides manually curated annotations of intrinsically disordered proteins from the literature. Here we report recent developments with DisProt (version 8), including the doubling of protein entries, a new disorder ontology, improvements of the annotation format and a completely new website. The website includes a redesigned graphical interface, a better search engine, a clearer API for programmatic access and a new annotation interface that integrates text mining technologies. The new entry format provides a greater flexibility, simplifies maintenance and allows the capture of more information from the literature. The new disorder ontology has been formalized and made interoperable by adopting the OWL format, as well as its structure and term definitions have been improved. The new annotation interface has made the curation process faster and more effective. We recently showed that new DisProt annotations can be effectively used to train and validate disorder predictors. We believe the growth of DisProt will accelerate, contributing to the improvement of function and disorder predictors and therefore to illuminate the 'dark' proteome.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-181993 (URN)10.1093/nar/gkz975 (DOI)000525956700039 ()31713636 (PubMedID)
Available from: 2020-05-26 Created: 2020-05-26 Last updated: 2022-03-23Bibliographically approved
Basile, W., Salvatore, M., Bassot, C. & Elofsson, A. (2019). Why do eukaryotic proteins contain more intrinsically disordered regions?. PloS Computational Biology, 15(7), Article ID e1007186.
Open this publication in new window or tab >>Why do eukaryotic proteins contain more intrinsically disordered regions?
2019 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 15, no 7, article id e1007186Article in journal (Refereed) Published
Abstract [en]

Intrinsic disorder is more abundant in eukaryotic than prokaryotic proteins. Methods predicting intrinsic disorder are based on the amino acid sequence of a protein. Therefore, there must exist an underlying difference in the sequences between eukaryotic and prokaryotic proteins causing the (predicted) difference in intrinsic disorder. By comparing proteins, from complete eukaryotic and prokaryotic proteomes, we show that the difference in intrinsic disorder emerges from the linker regions connecting Pfam domains. Eukaryotic proteins have more extended linker regions, and in addition, the eukaryotic linkers are significantly more disordered, 38% vs. 12-16% disordered residues. Next, we examined the underlying reason for the increase in disorder in eukaryotic linkers, and we found that the changes in abundance of only three amino acids cause the increase. Eukaryotic proteins contain 8.6% serine; while prokaryotic proteins have 6.5%, eukaryotic proteins also contain 5.4% proline and 5.3% isoleucine compared with 4.0% proline and ≈ 7.5% isoleucine in the prokaryotes. All these three differences contribute to the increased disorder in eukaryotic proteins. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. The differences are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. The observation that differences in the abundance of three amino acids cause the difference in disorder between eukaryotic and prokaryotic proteins raises the question: Are amino acid frequencies different in eukaryotic linkers because the linkers are more disordered or do the differences cause the increased disorder?

National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-171444 (URN)10.1371/journal.pcbi.1007186 (DOI)000481577700038 ()
Available from: 2019-08-08 Created: 2019-08-08 Last updated: 2022-03-23Bibliographically approved
Tsirigos, K. D., Govindarajan, S., Bassot, C., Västermark, Å., Lamb, J., Shu, N. & Elofsson, A. (2018). Topology of membrane proteins - predictions, limitations and variations. Current opinion in structural biology, 50, 9-17
Open this publication in new window or tab >>Topology of membrane proteins - predictions, limitations and variations
Show others...
2018 (English)In: Current opinion in structural biology, ISSN 0959-440X, E-ISSN 1879-033X, Vol. 50, p. 9-17Article in journal (Refereed) Published
Abstract [en]

Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these nonstandard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-160275 (URN)10.1016/j.sbi.2017.10.003 (DOI)000443661300004 ()29100082 (PubMedID)
Available from: 2018-09-18 Created: 2018-09-18 Last updated: 2022-02-26Bibliographically approved
Govindarajan, S., Bassot, C., Lamb, J., Shu, N., Huang, Y. & Elofsson, A.The evolutionary history of topological variations in the CPA/AT superfamily.
Open this publication in new window or tab >>The evolutionary history of topological variations in the CPA/AT superfamily
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

CPA/AT transporters consist of two structurally and evolutionarily related inverted repeat units, each of them with one core and one scaffold subdomain. During evolution, these families have undergone substantial changes in structure, topology and function. Central to the function of the transporters is the existence of two noncanonical helices that are involved in the transport process. In different families, two different types of these helices have been identified, reentrant and broken. Here, we use an integrated topology annotation method to identify novel topologies in the families. It combines topology prediction, similarity to families with known structure, and the difference in positively charged residues present in inside and outside loops in alternative topological models. We identified families with diverse topologies containing broken or reentrant helix. We classified all families based on 3 distinct evolutionary groups that each share a structurally similar C-terminal repeat unit newly termed as “Fold-types”. Using the evolutionary relationship between families we propose topological transitions including, a transition between broken and reentrant helices, complete change of orientation, changes in the number of scaffold helices and even in some rare cases, losses of core helices. The evolutionary history of the repeat units shows gene duplication and repeat shuffling events to result in these extensive topology variations. The novel structure-based classification, together with supporting structural models and other information, is presented in a searchable database, CPAfold (cpafold.bioinfo.se). Our comprehensive study of topology variations within the CPA superfamily provides better insight about their structure and evolution.

Keywords
Contact, prediction, trRosetta, DCA, protein evolution, membrane proteins
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-191208 (URN)10.1101/2020.12.13.422607 (DOI)
Available from: 2021-03-12 Created: 2021-03-12 Last updated: 2022-02-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-7161-9028

Search in DiVA

Show all publications