Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards a structurally resolved human protein interaction network
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).ORCID iD: 0000-0003-3439-1866
Show others and affiliations
Number of Authors: 162023 (English)In: Nature Structural & Molecular Biology, ISSN 1545-9993, E-ISSN 1545-9985, Vol. 30, no 2, p. 216-225Article in journal (Refereed) Published
Abstract [en]

Cellular functions are governed by molecular machines that assemble through protein-protein interactions. Their atomic details are critical to studying their molecular mechanisms. However, fewer than 5% of hundreds of thousands of human protein interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human protein interactions. We show that experiments can orthogonally confirm higher-confidence models. We identify 3,137 high-confidence models, of which 1,371 have no homology to a known structure. We identify interface residues harboring disease mutations, suggesting potential mechanisms for pathogenic variants. Groups of interface phosphorylation sites show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple protein interactions as signaling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies helping to expand our understanding of human cell biology.

Place, publisher, year, edition, pages
2023. Vol. 30, no 2, p. 216-225
National Category
Bioinformatics and Computational Biology
Identifiers
URN: urn:nbn:se:su:diva-215904DOI: 10.1038/s41594-022-00910-8ISI: 000928325000001PubMedID: 36690744Scopus ID: 2-s2.0-85146676554OAI: oai:DiVA.org:su-215904DiVA, id: diva2:1746786
Available from: 2023-03-29 Created: 2023-03-29 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Learning Protein Evolution and Structure
Open this publication in new window or tab >>Learning Protein Evolution and Structure
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

By analysing the structure of a protein it is possible to draw conclusions about its function. Obtaining the structure of a protein experimentally is however a time consuming and expensive process. By using evolution it is possible to infer the structure of a protein. AlphaFold2 (AF), the latest AI technology for protein structure prediction, uses evolutionary information to obtain protein structures in minutes instead of years at a fraction of the experimental cost. Here, we develop this technology further to predict the structure of interacting proteins. We create a confidence score, pDockQ, and show that this score rivals high-throughput experiments in distinguishing true and false protein-protein interactions (PPIs). Applying AF and the pDockQ score to a set of 65484 human PPIs we identify 1371 new high-confidence models. These models expand the structural knowledge of human protein complexes and can be used to e.g. develop new drugs or evaluate biological pathways. One limitation of AF is that the accuracy decreases with the number of proteins being predicted together and that the biggest protein complexes do not fit in the memory of the latest GPUs. To circumvent these issues, we predict subcomponents of protein complexes and assemble these together with Monte Carlo Tree search (MCTS). MCTS enables assembling some of the largest protein complexes using only sequence information and stoichiometry. Out of 175 protein complexes with 10-30 chains, 91 can be completely assembled with a median TM-score of 0.51. A third of these (30 complexes) are highly accurate (TM-score ≥0.8). The use of highly accurate protein structure prediction is revolutionising many fiends of biological research only one year after its realisation. Likely, this is only the beginning of a new era; the era of AI.  

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2022. p. 44
Keywords
Protein structure prediction, Evolution, AI, AlphaFold
National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-207579 (URN)978-91-7911-952-2 (ISBN)978-91-7911-953-9 (ISBN)
Public defence
2022-09-26, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Available from: 2022-09-01 Created: 2022-07-29 Last updated: 2025-02-07Bibliographically approved
2. Decipher protein complex structures from sequence
Open this publication in new window or tab >>Decipher protein complex structures from sequence
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are essential constituents of biological systems. A profound understanding of protein structure is significant for unraveling the intricate mechanisms of biological processes. The recent development of computational methods using AI technology is revolutionizing the structural biology field. Accurate predictions of three-dimentional protein structures can be generated from protein sequences, enabling rapid and accurate insights into protein interactions and functions. This thesis aims to investigate the applications of various cutting-edge methods in protein complex structure prediction. We first explore using trRosetta for dimeric protein complexes, and the study shows that the single-chain protein structure predictor is feasible for protein complexes. In light of the success of AlphaFold2, we use the pipeline FoldDock, which is an adaption of AlphaFold2 on protein complexes, for protein-protein interactions (PPIs) of two human interactome datasets and construct a PPI network. Next, we conduct a benchmark study of AlphaFold-Multimer in multi-chain protein complexes with 2 to 6 chains and examine how different evaluation scores affect the prediction assessment. In the last paper, we predict the large protein complexes starting from subcomponents using AlphaFold2 and a Monte Carlo Tree Search algorithm. The studies in this thesis show that deep learning approaches can yield reliable results in predicting protein complex structures, and there is ample potential for further improvement. 

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2023. p. 64
Keywords
Protein complex structure prediction, protein interaction, AI, AlphaFold
National Category
Bioinformatics (Computational Biology) Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-219975 (URN)978-91-8014-414-8 (ISBN)978-91-8014-415-5 (ISBN)
Public defence
2023-09-25, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Available from: 2023-08-31 Created: 2023-08-10 Last updated: 2025-02-05Bibliographically approved
3. Deep learning solutions to protein quaternary structure
Open this publication in new window or tab >>Deep learning solutions to protein quaternary structure
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Interactions between proteins are directly involved in most biological processes and are essential for the correct functioning of every form of life. The nature of protein-protein interactions allows functional assemblies of hundreds of protein chains. Given the enormous complexity and the pivotal role of protein interactions in life’s mechanics, the necessity to obtain a complete comprehension of such mechanisms is just as big as the challenge to achieve such knowledge. In the last few decades, experimental procedures constantly improved, dramatically increasing the available structural data for protein interactions. Unfortunately, experimental methods require a lot of time and resources and cannot always be applied with the same degree of success. Several computational methods have been developed in parallel with experimental procedures to overcome such limitations. Therefore, this thesis focused on screening existing computational methods and adopting them to improve the overall accuracy in solving structures of protein-complexes. In the first paper, I propose a simple rigid-body docking framework to test several interface predictors and their ability to drive a protein-protein docking procedure. Next, in the second paper, I display a method to adapt the trRosetta deep neural network to predict inter-residues distances and dihedral angle constraints for full protein complexes. The same concept is then improved in the third paper with FoldDock, an adaptation of Alphafold2 to work on multiple protein sequences and produce the corresponding complex. Finally, in the fourth paper, the FoldDock pipeline is applied to a large dataset of protein pairwise interactions derived from the hu.MAP and HuRI datasets, resulting in the characterization of more than 3000 high-confidence structural models.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2023. p. 78
Keywords
protein interactions, interface prediction, structure prediction, docking, deep learning
National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-219990 (URN)978-91-8014-450-6 (ISBN)978-91-8014-451-3 (ISBN)
Public defence
2023-10-06, Gamma2 - Air&Fire - G2690, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (English)
Opponent
Supervisors
Available from: 2023-09-13 Created: 2023-08-11 Last updated: 2025-02-07Bibliographically approved
4. Unlocking protein sequences: Advances in protein structure and ligand-binding site prediction
Open this publication in new window or tab >>Unlocking protein sequences: Advances in protein structure and ligand-binding site prediction
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The protein sequence determines how it will fold into its unique three-dimensional structure. Once folded, proteins perform their functions by interacting with other proteins or molecules called ligands within the cell. Experimental determination of protein structure and function is tedious. Computational approaches aim to accurately predict the properties of proteins to complement experimental efforts of understanding biochemical mechanisms within the cell. This thesis introduces computational techniques that predict the structure of protein complexes and identify protein residues involved in interactions with common biomolecules, such as metal ions and nucleic acids, based on sequence information. 

AlphaFold, a method that predicted protein structure using sequence information with almost experimental accuracy, was a critical breakthrough that shaped the field of protein structure prediction. Subsequently, approaches such as FoldDock adapted the AlphaFold pipeline for dimer complexes. Paper I applies the FoldDock protocol to understand toxin-antitoxin systems. These protein complexes are highly evolutionary conserved, and high-confidence dimer predictions were generated. Paper II applies the FoldDock protocol to study protein-protein interactions in the human proteome. To verify the reliability of machine-learning-based computational methods, they must be tested on independent data different from the data used to train the method. Paper III involves generating and using a homology-reduced independent test set to benchmark the performance of protein complex structure predictors, including the recent AlphaFold release adapted for multi-chain proteins – AlphaFold-Multimer. A confidence score (pDockQ2) was proposed to estimate the quality of the interfaces within multimers. Paper I, Paper II and Paper III are associated with predicting and evaluating protein-protein interactions. 

Representation learning involves finding effective representations of input data to maximise available information, making it easier to understand and process them for downstream prediction tasks. A recent advance in protein representation learning is Protein Language models (pLMs), where large language models are trained on a massive corpus of protein sequences. Highly contextualised and informative vector representations contained in the last hidden layer of the model have been used to predict numerous properties, such as ligand binding sites, subcellular localisation, and post-translational modifications, among others. Paper IV uses residue-level embeddings to predict whether a protein binds to one or more of the ten most common ions. It also predicts residue-level binding probabilities for multiple ions simultaneously. Paper V expands this approach beyond metals. It explores the impact of structure-informed features alongside sequence embeddings to predict whether a residue binds to nucleic acids, small molecules or metals.  Paper IV and Paper V are associated with developing machine learning methods to predict and evaluate protein-ligand interactions. 

In summary, the research conducted within this thesis offers valuable insights into three crucial levers to systematically harness the potential of machine learning for protein bioinformatics. These are (1) construction of homology-reduced non-redundant datasets, (2) finding optimal protein representations, and (3) rigorous evaluation and inference. 

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2024. p. 55
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-224344 (URN)978-91-8014-613-5 (ISBN)978-91-8014-614-2 (ISBN)
Public defence
2024-01-26, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 09:00 (English)
Opponent
Supervisors
Available from: 2024-01-02 Created: 2023-12-07 Last updated: 2023-12-20Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Bryant, PatrickPozzati, GabrieleShenoy, AditiZhu, WensiKundrotas, PetrasElofsson, Arne

Search in DiVA

By author/editor
Bryant, PatrickPozzati, GabrieleShenoy, AditiZhu, WensiKundrotas, PetrasElofsson, Arne
By organisation
Department of Biochemistry and BiophysicsScience for Life Laboratory (SciLifeLab)
In the same journal
Nature Structural & Molecular Biology
Bioinformatics and Computational Biology

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 179 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf