Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0003-3439-1866
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
Antal upphovsmän: 22020 (Engelska)Ingår i: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 432, nr 16, s. 4435-4446Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 IDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.

Ort, förlag, år, upplaga, sidor
2020. Vol. 432, nr 16, s. 4435-4446
Nyckelord [en]
protein evolution, protein structure, evolutionary distance, mutations
Nationell ämneskategori
Biologiska vetenskaper
Identifikatorer
URN: urn:nbn:se:su:diva-184383DOI: 10.1016/j.jmb.2020.05.021ISI: 000552832700008PubMedID: 32485208OAI: oai:DiVA.org:su-184383DiVA, id: diva2:1473247
Tillgänglig från: 2020-10-05 Skapad: 2020-10-05 Senast uppdaterad: 2022-08-24Bibliografiskt granskad
Ingår i avhandling
1. Learning Protein Evolution and Structure
Öppna denna publikation i ny flik eller fönster >>Learning Protein Evolution and Structure
2022 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

By analysing the structure of a protein it is possible to draw conclusions about its function. Obtaining the structure of a protein experimentally is however a time consuming and expensive process. By using evolution it is possible to infer the structure of a protein. AlphaFold2 (AF), the latest AI technology for protein structure prediction, uses evolutionary information to obtain protein structures in minutes instead of years at a fraction of the experimental cost. Here, we develop this technology further to predict the structure of interacting proteins. We create a confidence score, pDockQ, and show that this score rivals high-throughput experiments in distinguishing true and false protein-protein interactions (PPIs). Applying AF and the pDockQ score to a set of 65484 human PPIs we identify 1371 new high-confidence models. These models expand the structural knowledge of human protein complexes and can be used to e.g. develop new drugs or evaluate biological pathways. One limitation of AF is that the accuracy decreases with the number of proteins being predicted together and that the biggest protein complexes do not fit in the memory of the latest GPUs. To circumvent these issues, we predict subcomponents of protein complexes and assemble these together with Monte Carlo Tree search (MCTS). MCTS enables assembling some of the largest protein complexes using only sequence information and stoichiometry. Out of 175 protein complexes with 10-30 chains, 91 can be completely assembled with a median TM-score of 0.51. A third of these (30 complexes) are highly accurate (TM-score ≥0.8). The use of highly accurate protein structure prediction is revolutionising many fiends of biological research only one year after its realisation. Likely, this is only the beginning of a new era; the era of AI.  

Ort, förlag, år, upplaga, sidor
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2022. s. 44
Nyckelord
Protein structure prediction, Evolution, AI, AlphaFold
Nationell ämneskategori
Bioinformatik och beräkningsbiologi
Forskningsämne
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-207579 (URN)978-91-7911-952-2 (ISBN)978-91-7911-953-9 (ISBN)
Disputation
2022-09-26, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2022-09-01 Skapad: 2022-07-29 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMed

Person

Bryant, PatrickElofsson, Arne

Sök vidare i DiVA

Av författaren/redaktören
Bryant, PatrickElofsson, Arne
Av organisationen
Institutionen för biokemi och biofysikScience for Life Laboratory (SciLifeLab)
I samma tidskrift
Journal of Molecular Biology
Biologiska vetenskaper

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 91 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf