Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Decomposing Structural Response Due to Sequence Changes in Protein Domains with Machine Learning
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0003-3439-1866
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
Rekke forfattare: 22020 (engelsk)Inngår i: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 432, nr 16, s. 4435-4446Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

How protein domain structure changes in response to mutations is not well understood. Some mutations change the structure drastically, while most only result in small changes. To gain an understanding of this, we decompose the relationship between changes in domain sequence and structure using machine learning. We select pairs of evolutionarily related domains with a broad range of evolutionary distances. In contrast to earlier studies, we do not find a strictly linear relationship between sequence and structural changes. We train a random forest regressor that predicts the structural similarity between pairs with an average accuracy of 0.029 IDDT ( local Distance Difference Test) score, and a correlation coefficient of 0.92. Decomposing the feature importance shows that the domain length, or analogously, size is the most important feature. Our model enables assessing deviations in relative structural response, and thus prediction of evolutionary trajectories, in protein domains across evolution.

sted, utgiver, år, opplag, sider
2020. Vol. 432, nr 16, s. 4435-4446
Emneord [en]
protein evolution, protein structure, evolutionary distance, mutations
HSV kategori
Identifikatorer
URN: urn:nbn:se:su:diva-184383DOI: 10.1016/j.jmb.2020.05.021ISI: 000552832700008PubMedID: 32485208OAI: oai:DiVA.org:su-184383DiVA, id: diva2:1473247
Tilgjengelig fra: 2020-10-05 Laget: 2020-10-05 Sist oppdatert: 2022-08-24bibliografisk kontrollert
Inngår i avhandling
1. Learning Protein Evolution and Structure
Åpne denne publikasjonen i ny fane eller vindu >>Learning Protein Evolution and Structure
2022 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

By analysing the structure of a protein it is possible to draw conclusions about its function. Obtaining the structure of a protein experimentally is however a time consuming and expensive process. By using evolution it is possible to infer the structure of a protein. AlphaFold2 (AF), the latest AI technology for protein structure prediction, uses evolutionary information to obtain protein structures in minutes instead of years at a fraction of the experimental cost. Here, we develop this technology further to predict the structure of interacting proteins. We create a confidence score, pDockQ, and show that this score rivals high-throughput experiments in distinguishing true and false protein-protein interactions (PPIs). Applying AF and the pDockQ score to a set of 65484 human PPIs we identify 1371 new high-confidence models. These models expand the structural knowledge of human protein complexes and can be used to e.g. develop new drugs or evaluate biological pathways. One limitation of AF is that the accuracy decreases with the number of proteins being predicted together and that the biggest protein complexes do not fit in the memory of the latest GPUs. To circumvent these issues, we predict subcomponents of protein complexes and assemble these together with Monte Carlo Tree search (MCTS). MCTS enables assembling some of the largest protein complexes using only sequence information and stoichiometry. Out of 175 protein complexes with 10-30 chains, 91 can be completely assembled with a median TM-score of 0.51. A third of these (30 complexes) are highly accurate (TM-score ≥0.8). The use of highly accurate protein structure prediction is revolutionising many fiends of biological research only one year after its realisation. Likely, this is only the beginning of a new era; the era of AI.  

sted, utgiver, år, opplag, sider
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2022. s. 44
Emneord
Protein structure prediction, Evolution, AI, AlphaFold
HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-207579 (URN)978-91-7911-952-2 (ISBN)978-91-7911-953-9 (ISBN)
Disputas
2022-09-26, Air & Fire, SciLifeLab, Tomtebodavägen 23A, Solna, 14:00 (engelsk)
Opponent
Veileder
Tilgjengelig fra: 2022-09-01 Laget: 2022-07-29 Sist oppdatert: 2025-02-07bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstPubMed

Person

Bryant, PatrickElofsson, Arne

Søk i DiVA

Av forfatter/redaktør
Bryant, PatrickElofsson, Arne
Av organisasjonen
I samme tidsskrift
Journal of Molecular Biology

Søk utenfor DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 91 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf