Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Impact of joint structure and sequence representations for ligand binding site prediction.
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0001-7748-2501
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0002-7115-9751
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Summary: Accurate ligand-binding site prediction provides insights into molecular interactions, drug discovery and design. Most computational methods can identify residues binding to a single ligand but cannot simultaneously predict binding sites for multiple ligands. Sequence-based methods that predict multiple ligands are fast but have poor performance. Structure-based methods primarily use protein surface properties to predict ligand binding sites. These methods are accurate but slow. We studied the impact of combining structure-informed representations with sequence embeddings to generate a quick yet accurate predictor. While the protein binding surface interacting with each ligand is unique, we find that structure-informed representations do not significantly improve prediction performance. Availability and Implementation: Source code available at https://github.com/aditishenoy/ligandbinding

HSV kategori
Identifikatorer
URN: urn:nbn:se:su:diva-224343OAI: oai:DiVA.org:su-224343DiVA, id: diva2:1817739
Forskningsfinansiär
Knut and Alice Wallenberg FoundationTilgjengelig fra: 2023-12-07 Laget: 2023-12-07 Sist oppdatert: 2023-12-07
Inngår i avhandling
1. Unlocking protein sequences: Advances in protein structure and ligand-binding site prediction
Åpne denne publikasjonen i ny fane eller vindu >>Unlocking protein sequences: Advances in protein structure and ligand-binding site prediction
2024 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

The protein sequence determines how it will fold into its unique three-dimensional structure. Once folded, proteins perform their functions by interacting with other proteins or molecules called ligands within the cell. Experimental determination of protein structure and function is tedious. Computational approaches aim to accurately predict the properties of proteins to complement experimental efforts of understanding biochemical mechanisms within the cell. This thesis introduces computational techniques that predict the structure of protein complexes and identify protein residues involved in interactions with common biomolecules, such as metal ions and nucleic acids, based on sequence information. 

AlphaFold, a method that predicted protein structure using sequence information with almost experimental accuracy, was a critical breakthrough that shaped the field of protein structure prediction. Subsequently, approaches such as FoldDock adapted the AlphaFold pipeline for dimer complexes. Paper I applies the FoldDock protocol to understand toxin-antitoxin systems. These protein complexes are highly evolutionary conserved, and high-confidence dimer predictions were generated. Paper II applies the FoldDock protocol to study protein-protein interactions in the human proteome. To verify the reliability of machine-learning-based computational methods, they must be tested on independent data different from the data used to train the method. Paper III involves generating and using a homology-reduced independent test set to benchmark the performance of protein complex structure predictors, including the recent AlphaFold release adapted for multi-chain proteins – AlphaFold-Multimer. A confidence score (pDockQ2) was proposed to estimate the quality of the interfaces within multimers. Paper I, Paper II and Paper III are associated with predicting and evaluating protein-protein interactions. 

Representation learning involves finding effective representations of input data to maximise available information, making it easier to understand and process them for downstream prediction tasks. A recent advance in protein representation learning is Protein Language models (pLMs), where large language models are trained on a massive corpus of protein sequences. Highly contextualised and informative vector representations contained in the last hidden layer of the model have been used to predict numerous properties, such as ligand binding sites, subcellular localisation, and post-translational modifications, among others. Paper IV uses residue-level embeddings to predict whether a protein binds to one or more of the ten most common ions. It also predicts residue-level binding probabilities for multiple ions simultaneously. Paper V expands this approach beyond metals. It explores the impact of structure-informed features alongside sequence embeddings to predict whether a residue binds to nucleic acids, small molecules or metals.  Paper IV and Paper V are associated with developing machine learning methods to predict and evaluate protein-ligand interactions. 

In summary, the research conducted within this thesis offers valuable insights into three crucial levers to systematically harness the potential of machine learning for protein bioinformatics. These are (1) construction of homology-reduced non-redundant datasets, (2) finding optimal protein representations, and (3) rigorous evaluation and inference. 

sted, utgiver, år, opplag, sider
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2024. s. 55
HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-224344 (URN)978-91-8014-613-5 (ISBN)978-91-8014-614-2 (ISBN)
Disputas
2024-01-26, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 09:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2024-01-02 Laget: 2023-12-07 Sist oppdatert: 2023-12-20bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Person

Shenoy, AditiElofsson, Arne

Søk i DiVA

Av forfatter/redaktør
Shenoy, AditiElofsson, Arne
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric

urn-nbn
Totalt: 1153 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf