Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Decipher protein complex structures from sequence
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.ORCID iD: 0000-0002-5032-3727
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are essential constituents of biological systems. A profound understanding of protein structure is significant for unraveling the intricate mechanisms of biological processes. The recent development of computational methods using AI technology is revolutionizing the structural biology field. Accurate predictions of three-dimentional protein structures can be generated from protein sequences, enabling rapid and accurate insights into protein interactions and functions. This thesis aims to investigate the applications of various cutting-edge methods in protein complex structure prediction. We first explore using trRosetta for dimeric protein complexes, and the study shows that the single-chain protein structure predictor is feasible for protein complexes. In light of the success of AlphaFold2, we use the pipeline FoldDock, which is an adaption of AlphaFold2 on protein complexes, for protein-protein interactions (PPIs) of two human interactome datasets and construct a PPI network. Next, we conduct a benchmark study of AlphaFold-Multimer in multi-chain protein complexes with 2 to 6 chains and examine how different evaluation scores affect the prediction assessment. In the last paper, we predict the large protein complexes starting from subcomponents using AlphaFold2 and a Monte Carlo Tree Search algorithm. The studies in this thesis show that deep learning approaches can yield reliable results in predicting protein complex structures, and there is ample potential for further improvement. 

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2023. , p. 64
Keywords [en]
Protein complex structure prediction, protein interaction, AI, AlphaFold
National Category
Bioinformatics (Computational Biology) Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-219975ISBN: 978-91-8014-414-8 (print)ISBN: 978-91-8014-415-5 (electronic)OAI: oai:DiVA.org:su-219975DiVA, id: diva2:1786847
Public defence
2023-09-25, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Available from: 2023-08-31 Created: 2023-08-10 Last updated: 2025-02-05Bibliographically approved
List of papers
1. Limits and potential of combined folding and docking
Open this publication in new window or tab >>Limits and potential of combined folding and docking
Show others...
2022 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 38, no 4, p. 954-961Article in journal (Refereed) Published
Abstract [en]

Motivation: In the last decade, de novo protein structure prediction accuracy for individual proteins has improved significantly by utilising deep learning (DL) methods for harvesting the co-evolution information from large multiple sequence alignments (MSAs). The same approach can, in principle, also be used to extract information about evolutionary-based contacts across protein-protein interfaces. However, most earlier studies have not used the latest DL methods for inter-chain contact distance prediction. This article introduces a fold-and-dock method based on predicted residue-residue distances with trRosetta.

Results: The method can simultaneously predict the tertiary and quaternary structure of a protein pair, even when the structures of the monomers are not known. The straightforward application of this method to a standard dataset for protein-protein docking yielded limited success. However, using alternative methods for generating MSAs allowed us to dock accurately significantly more proteins. We also introduced a novel scoring function, PconsDock, that accurately separates 98% of correctly and incorrectly folded and docked proteins. The average performance of the method is comparable to the use of traditional, template-based or ab initio shape-complementarity-only docking methods. Moreover, the results of conventional and fold-and-dock approaches are complementary, and thus a combined docking pipeline could increase overall docking success significantly. This methodology contributed to the best model for one of the CASP14 oligomeric targets, H1065.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-202237 (URN)10.1093/bioinformatics/btab760 (DOI)000747962400010 ()34788800 (PubMedID)
Available from: 2022-02-23 Created: 2022-02-23 Last updated: 2023-08-11Bibliographically approved
2. Towards a structurally resolved human protein interaction network
Open this publication in new window or tab >>Towards a structurally resolved human protein interaction network
Show others...
2023 (English)In: Nature Structural & Molecular Biology, ISSN 1545-9993, E-ISSN 1545-9985, Vol. 30, no 2, p. 216-225Article in journal (Refereed) Published
Abstract [en]

Cellular functions are governed by molecular machines that assemble through protein-protein interactions. Their atomic details are critical to studying their molecular mechanisms. However, fewer than 5% of hundreds of thousands of human protein interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human protein interactions. We show that experiments can orthogonally confirm higher-confidence models. We identify 3,137 high-confidence models, of which 1,371 have no homology to a known structure. We identify interface residues harboring disease mutations, suggesting potential mechanisms for pathogenic variants. Groups of interface phosphorylation sites show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple protein interactions as signaling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies helping to expand our understanding of human cell biology.

National Category
Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-215904 (URN)10.1038/s41594-022-00910-8 (DOI)000928325000001 ()36690744 (PubMedID)2-s2.0-85146676554 (Scopus ID)
Available from: 2023-03-29 Created: 2023-03-29 Last updated: 2025-02-07Bibliographically approved
3. Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes
Open this publication in new window or tab >>Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes
2023 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 39, no 7, article id btad424Article in journal (Refereed) Published
Abstract [en]

Motivation: Despite near-experimental accuracy on single-chain predictions, there is still scope for improvement among multimeric predictions. Methods like AlphaFold-Multimer and FoldDock can accurately model dimers. However, how well these methods fare on larger complexes is still unclear. Further, evaluation methods of the quality of multimeric complexes are not well established.

Results: We analysed the performance of AlphaFold-Multimer on a homology-reduced dataset of homo- and heteromeric protein complexes. We highlight the differences between the pairwise and multi-interface evaluation of chains within a multimer. We describe why certain complexes perform well on one metric (e.g. TM-score) but poorly on another (e.g. DockQ). We propose a new score, Predicted DockQ version 2 (pDockQ2), to estimate the quality of each interface in a multimer. Finally, we modelled protein complexes (from CORUM) and identified two highly confident structures that do not have sequence homology to any existing structures.

Availability and implementation: All scripts, models, and data used to perform the analysis in this study are freely available at https://gitlab.com/ElofssonLab/afm-benchmark.

National Category
Bioinformatics (Computational Biology) Bioinformatics and Computational Biology
Identifiers
urn:nbn:se:su:diva-219972 (URN)10.1093/bioinformatics/btad424 (DOI)001030747300005 ()2-s2.0-85166268973 (Scopus ID)
Funder
Swedish Research Council, 2021-03979Knut and Alice Wallenberg Foundation
Available from: 2023-08-10 Created: 2023-08-10 Last updated: 2025-02-05Bibliographically approved
4. Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search
Open this publication in new window or tab >>Predicting the structure of large protein complexes using AlphaFold and Monte Carlo tree search
Show others...
2022 (English)In: Nature Communications, E-ISSN 2041-1723, Vol. 13, no 1, article id 6028Article in journal (Refereed) Published
Abstract [en]

AlphaFold can predict the structure of single- and multiple-chain proteins with very high accuracy. However, the accuracy decreases with the number of chains, and the available GPU memory limits the size of protein complexes which can be predicted. Here we show that one can predict the structure of large complexes starting from predictions of subcomponents. We assemble 91 out of 175 complexes with 10–30 chains from predicted subcomponents using Monte Carlo tree search, with a median TM-score of 0.51. There are 30 highly accurate complexes (TM-score ≥0.8, 33% of complete assemblies). We create a scoring function, mpDockQ, that can distinguish if assemblies are complete and predict their accuracy. We find that complexes containing symmetry are accurately assembled, while asymmetrical complexes remain challenging. The method is freely available and accesible as a Colab notebook https://colab.research.google.com/github/patrickbryant1/MoLPC/blob/master/MoLPC.ipynb.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-211010 (URN)10.1038/s41467-022-33729-4 (DOI)000867312100019 ()36224222 (PubMedID)2-s2.0-85139763194 (Scopus ID)
Available from: 2022-11-09 Created: 2022-11-09 Last updated: 2023-08-10Bibliographically approved

Open Access in DiVA

Decipher protein complex structures from sequence(7940 kB)603 downloads
File information
File name FULLTEXT03.pdfFile size 7940 kBChecksum SHA-512
7bd857dd99c2015f7778bb10ce4e503f38a2a94da08151f1c515d2ba8c8bffc29c341b74b1768b028dda5cb52fa2037bdf63ac23b2dc2fb74366fb93227510b0
Type fulltextMimetype application/pdf

Authority records

Zhu, Wensi

Search in DiVA

By author/editor
Zhu, Wensi
By organisation
Department of Biochemistry and Biophysics
Bioinformatics (Computational Biology)Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 606 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 851 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf