Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning Protein Evolution and Structure
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.ORCID iD: 0000-0003-3439-1866
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

By analysing the structure of a protein it is possible to draw conclusions about its function. Obtaining the structure of a protein experimentally is however a time consuming and expensive process. By using evolution it is possible to infer the structure of a protein. AlphaFold2 (AF), the latest AI technology for protein structure prediction, uses evolutionary information to obtain protein structures in minutes instead of years at a fraction of the experimental cost. Here, we develop this technology further to predict the structure of interacting proteins. We create a confidence score, pDockQ, and show that this score rivals high-throughput experiments in distinguishing true and false protein-protein interactions (PPIs). Applying AF and the pDockQ score to a set of 65484 human PPIs we identify 1371 new high-confidence models. These models expand the structural knowledge of human protein complexes and can be used to e.g. develop new drugs or evaluate biological pathways. One limitation of AF is that the accuracy decreases with the number of proteins being predicted together and that the biggest protein complexes do not fit in the memory of the latest GPUs. To circumvent these issues, we predict subcomponents of protein complexes and assemble these together with Monte Carlo Tree search (MCTS). MCTS enables assembling some of the largest protein complexes using only sequence information and stoichiometry. Out of 175 protein complexes with 10-30 chains, 91 can be completely assembled with a median TM-score of 0.51. A third of these (30 complexes) are highly accurate (TM-score ≥0.8). The use of highly accurate protein structure prediction is revolutionising many fiends of biological research only one year after its realisation. Likely, this is only the beginning of a new era; the era of AI.  

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2022. , p. 44
Keywords [en]
Protein structure prediction, Evolution, AI, AlphaFold
National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-207579ISBN: 978-91-7911-952-2 (print)ISBN: 978-91-7911-953-9 (electronic)OAI: oai:DiVA.org:su-207579DiVA, id: diva2:1684968
Public defence
2022-09-26, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Available from: 2022-09-01 Created: 2022-07-29 Last updated: 2025-02-07Bibliographically approved
List of papers
1. Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning
Open this publication in new window or tab >>Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning
2020 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 432, no 16, p. 4435-4446Article in journal (Refereed) Published
Abstract [en]

How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 IDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.

Keywords
protein evolution, protein structure, evolutionary distance, mutations
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-184383 (URN)10.1016/j.jmb.2020.05.021 (DOI)000552832700008 ()32485208 (PubMedID)
Available from: 2020-10-05 Created: 2020-10-05 Last updated: 2022-08-24Bibliographically approved
2. Improved prediction of protein-protein interactions using AlphaFold2
Open this publication in new window or tab >>Improved prediction of protein-protein interactions using AlphaFold2
2022 (English)In: Nature Communications, E-ISSN 2041-1723, Vol. 13, no 1, article id 1265Article in journal (Refereed) Published
Abstract [en]

Predicting the structure of interacting protein chains is a fundamental step towards understanding protein function. Unfortunately, no computational method can produce accurate structures of protein complexes. AlphaFold2, has shown unprecedented levels of accuracy in modelling single chain protein structures. Here, we apply AlphaFold2 for the prediction of heterodimeric protein complexes. We find that the AlphaFold2 protocol together with optimised multiple sequence alignments, generate models with acceptable quality (DockQ >= 0.23) for 63% of the dimers. From the predicted interfaces we create a simple function to predict the DockQ score which distinguishes acceptable from incorrect models as well as interacting from non-interacting proteins with state-of-art accuracy. We find that, using the predicted DockQ scores, we can identify 51% of all interacting pairs at 1% FPR. Predicting the structure of protein complexes is extremely difficult. Here, authors apply AlphaFold2 with optimized multiple sequence alignments to model complexes of interacting proteins, enabling prediction of both if and how proteins interact with state-of-art accuracy.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-203709 (URN)10.1038/s41467-022-28865-w (DOI)000767467900005 ()35273146 (PubMedID)2-s2.0-85126195059 (Scopus ID)
Note

For correction, see: Bryant, P., Pozzati, G. & Elofsson, A. Author Correction: Improved prediction of protein-protein interactions using AlphaFold2. Nat Commun 13, 1694 (2022). DOI: 10.1038/s41467-022-29480-5

Available from: 2022-04-08 Created: 2022-04-08 Last updated: 2023-08-11Bibliographically approved
3. Towards a structurally resolved human protein interaction network
Open this publication in new window or tab >>Towards a structurally resolved human protein interaction network
Show others...
2023 (English)In: Nature Structural & Molecular Biology, ISSN 1545-9993, E-ISSN 1545-9985, Vol. 30, no 2, p. 216-225Article in journal (Refereed) Published
Abstract [en]

Cellular functions are governed by molecular machines that assemble through protein-protein interactions. Their atomic details are critical to studying their molecular mechanisms. However, fewer than 5% of hundreds of thousands of human protein interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human protein interactions. We show that experiments can orthogonally confirm higher-confidence models. We identify 3,137 high-confidence models, of which 1,371 have no homology to a known structure. We identify interface residues harboring disease mutations, suggesting potential mechanisms for pathogenic variants. Groups of interface phosphorylation sites show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple protein interactions as signaling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies helping to expand our understanding of human cell biology.

National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-215904 (URN)10.1038/s41594-022-00910-8 (DOI)000928325000001 ()36690744 (PubMedID)2-s2.0-85146676554 (Scopus ID)
Available from: 2023-03-29 Created: 2023-03-29 Last updated: 2025-02-07Bibliographically approved
4. Predicting the structure of large proteincomplexes using AlphaFold and MonteCarlo tree search
Open this publication in new window or tab >>Predicting the structure of large proteincomplexes using AlphaFold and MonteCarlo tree search
(English)Manuscript (preprint) (Other academic)
Abstract [en]

AlphaFold can predict the structure of single- and multiple-chain proteins with very highaccuracy. However, the accuracy decreases with the number of chains, and the availableGPU memory limits the size of protein complexes which can be predicted. Here we showthat one can predict the structure of large complexes starting from predictions ofsubcomponents. We assemble 91 out of 175 complexes with 10-30 chains from predictedsubcomponents using Monte Carlo tree search, with a median TM-score of 0.51. There are30 highly accurate complexes (TM-score ≥0.8, 33% of complete assemblies). We create ascoring function, mpDockQ, that can distinguish if assemblies are complete and predict theiraccuracy. We find that complexes containing symmetry are accurately assembled, whileasymmetrical complexes remain challenging. The method is freely available and accesibleas a Colab notebookhttps://colab.research.google.com/github/patrickbryant1/MoLPC/blob/master/MoLPC.ipynb.

Keywords
Protein structure prediction AlphaFold Complex assembly Monte Carlo tree search
National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-207577 (URN)
Available from: 2022-07-29 Created: 2022-07-29 Last updated: 2025-02-07

Open Access in DiVA

Learning Protein Evolution and Structure(12758 kB)2681 downloads
File information
File name FULLTEXT01.pdfFile size 12758 kBChecksum SHA-512
fda93118fa1fb7343a10e0a7b19e3d4f88fd513682a60b6101d3c4c59c18af3b59835811519ac0ab53fa3861ad4e71d7ea4ad288f54326136fbf1e48a1a2f3dd
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Bryant, Patrick
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 2681 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2052 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf