Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Evaluation of AlphaFold-Multimer prediction on multi-chain protein complexes
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0002-5032-3727
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0001-7748-2501
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab). The University of Kansas, Lawrence, United States.ORCID-id: 0000-0001-5080-1664
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0002-7115-9751
2023 (Engelska)Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 39, nr 7, artikel-id btad424Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Motivation: Despite near-experimental accuracy on single-chain predictions, there is still scope for improvement among multimeric predictions. Methods like AlphaFold-Multimer and FoldDock can accurately model dimers. However, how well these methods fare on larger complexes is still unclear. Further, evaluation methods of the quality of multimeric complexes are not well established.

Results: We analysed the performance of AlphaFold-Multimer on a homology-reduced dataset of homo- and heteromeric protein complexes. We highlight the differences between the pairwise and multi-interface evaluation of chains within a multimer. We describe why certain complexes perform well on one metric (e.g. TM-score) but poorly on another (e.g. DockQ). We propose a new score, Predicted DockQ version 2 (pDockQ2), to estimate the quality of each interface in a multimer. Finally, we modelled protein complexes (from CORUM) and identified two highly confident structures that do not have sequence homology to any existing structures.

Availability and implementation: All scripts, models, and data used to perform the analysis in this study are freely available at https://gitlab.com/ElofssonLab/afm-benchmark.

Ort, förlag, år, upplaga, sidor
2023. Vol. 39, nr 7, artikel-id btad424
Nationell ämneskategori
Bioinformatik (beräkningsbiologi) Bioinformatik och beräkningsbiologi
Identifikatorer
URN: urn:nbn:se:su:diva-219972DOI: 10.1093/bioinformatics/btad424ISI: 001030747300005Scopus ID: 2-s2.0-85166268973OAI: oai:DiVA.org:su-219972DiVA, id: diva2:1786787
Forskningsfinansiär
Vetenskapsrådet, 2021-03979Knut och Alice Wallenbergs StiftelseTillgänglig från: 2023-08-10 Skapad: 2023-08-10 Senast uppdaterad: 2025-02-05Bibliografiskt granskad
Ingår i avhandling
1. Decipher protein complex structures from sequence
Öppna denna publikation i ny flik eller fönster >>Decipher protein complex structures from sequence
2023 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Proteins are essential constituents of biological systems. A profound understanding of protein structure is significant for unraveling the intricate mechanisms of biological processes. The recent development of computational methods using AI technology is revolutionizing the structural biology field. Accurate predictions of three-dimentional protein structures can be generated from protein sequences, enabling rapid and accurate insights into protein interactions and functions. This thesis aims to investigate the applications of various cutting-edge methods in protein complex structure prediction. We first explore using trRosetta for dimeric protein complexes, and the study shows that the single-chain protein structure predictor is feasible for protein complexes. In light of the success of AlphaFold2, we use the pipeline FoldDock, which is an adaption of AlphaFold2 on protein complexes, for protein-protein interactions (PPIs) of two human interactome datasets and construct a PPI network. Next, we conduct a benchmark study of AlphaFold-Multimer in multi-chain protein complexes with 2 to 6 chains and examine how different evaluation scores affect the prediction assessment. In the last paper, we predict the large protein complexes starting from subcomponents using AlphaFold2 and a Monte Carlo Tree Search algorithm. The studies in this thesis show that deep learning approaches can yield reliable results in predicting protein complex structures, and there is ample potential for further improvement. 

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2023. s. 64
Nyckelord
Protein complex structure prediction, protein interaction, AI, AlphaFold
Nationell ämneskategori
Bioinformatik (beräkningsbiologi) Bioinformatik och beräkningsbiologi
Forskningsämne
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-219975 (URN)978-91-8014-414-8 (ISBN)978-91-8014-415-5 (ISBN)
Disputation
2023-09-25, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2023-08-31 Skapad: 2023-08-10 Senast uppdaterad: 2025-02-05Bibliografiskt granskad
2. Unlocking protein sequences: Advances in protein structure and ligand-binding site prediction
Öppna denna publikation i ny flik eller fönster >>Unlocking protein sequences: Advances in protein structure and ligand-binding site prediction
2024 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The protein sequence determines how it will fold into its unique three-dimensional structure. Once folded, proteins perform their functions by interacting with other proteins or molecules called ligands within the cell. Experimental determination of protein structure and function is tedious. Computational approaches aim to accurately predict the properties of proteins to complement experimental efforts of understanding biochemical mechanisms within the cell. This thesis introduces computational techniques that predict the structure of protein complexes and identify protein residues involved in interactions with common biomolecules, such as metal ions and nucleic acids, based on sequence information. 

AlphaFold, a method that predicted protein structure using sequence information with almost experimental accuracy, was a critical breakthrough that shaped the field of protein structure prediction. Subsequently, approaches such as FoldDock adapted the AlphaFold pipeline for dimer complexes. Paper I applies the FoldDock protocol to understand toxin-antitoxin systems. These protein complexes are highly evolutionary conserved, and high-confidence dimer predictions were generated. Paper II applies the FoldDock protocol to study protein-protein interactions in the human proteome. To verify the reliability of machine-learning-based computational methods, they must be tested on independent data different from the data used to train the method. Paper III involves generating and using a homology-reduced independent test set to benchmark the performance of protein complex structure predictors, including the recent AlphaFold release adapted for multi-chain proteins – AlphaFold-Multimer. A confidence score (pDockQ2) was proposed to estimate the quality of the interfaces within multimers. Paper I, Paper II and Paper III are associated with predicting and evaluating protein-protein interactions. 

Representation learning involves finding effective representations of input data to maximise available information, making it easier to understand and process them for downstream prediction tasks. A recent advance in protein representation learning is Protein Language models (pLMs), where large language models are trained on a massive corpus of protein sequences. Highly contextualised and informative vector representations contained in the last hidden layer of the model have been used to predict numerous properties, such as ligand binding sites, subcellular localisation, and post-translational modifications, among others. Paper IV uses residue-level embeddings to predict whether a protein binds to one or more of the ten most common ions. It also predicts residue-level binding probabilities for multiple ions simultaneously. Paper V expands this approach beyond metals. It explores the impact of structure-informed features alongside sequence embeddings to predict whether a residue binds to nucleic acids, small molecules or metals.  Paper IV and Paper V are associated with developing machine learning methods to predict and evaluate protein-ligand interactions. 

In summary, the research conducted within this thesis offers valuable insights into three crucial levers to systematically harness the potential of machine learning for protein bioinformatics. These are (1) construction of homology-reduced non-redundant datasets, (2) finding optimal protein representations, and (3) rigorous evaluation and inference. 

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2024. s. 55
Nationell ämneskategori
Bioinformatik (beräkningsbiologi)
Forskningsämne
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-224344 (URN)978-91-8014-613-5 (ISBN)978-91-8014-614-2 (ISBN)
Disputation
2024-01-26, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 09:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2024-01-02 Skapad: 2023-12-07 Senast uppdaterad: 2023-12-20Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Zhu, WensiShenoy, AditiKundrotas, PetrasElofsson, Arne

Sök vidare i DiVA

Av författaren/redaktören
Zhu, WensiShenoy, AditiKundrotas, PetrasElofsson, Arne
Av organisationen
Institutionen för biokemi och biofysikScience for Life Laboratory (SciLifeLab)
I samma tidskrift
Bioinformatics
Bioinformatik (beräkningsbiologi)Bioinformatik och beräkningsbiologi

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 380 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf