Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
pyconsFold: A fast and easy tool for modelling and docking using distance predictions
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.ORCID iD: 0000-0003-0568-8281
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.ORCID iD: 0000-0002-7115-9751
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Motivation Contact predictions within a protein has recently become a viable method for accurate prediction of protein structure. Using predicted distance distributions has been shown in many cases to be superior to only using a binary contact annotation. Using predicted inter-protein distances has also been shown to be able to dock some protein dimers.

Results Here we present pyconsFold. Using CNS as its underlying folding mechanism and predicted contact distance it outperforms regular contact prediction based modelling on our dataset of 210 proteins. It performs marginally worse than the state of the art pyRosetta folding pipeline but is on average about 20 times faster per model. More importantly pyconsFold can also be used as a fold-and-dock protocol by using predicted inter-protein contacts to simultaneously fold and dock two protein chains.

Availability and implementation pyconsFold is implemented in Python 3 with a strong focus on using as few dependencies as possible for longevity. It is available both as a pip package in Python 3 and as source code on GitHub and is published under the GPLv3 license.

Contact arne{at}bioinfo.seSupplemental material Install instructions, examples and parameters can be found in the supplemental notes.

Availability of data The data underlying this article together with source code are available on github, at https://github.com/johnlamb/pyconsfold.

Keywords [en]
Protein structure prediction, Protien interactions, protein distiance predictions, trRosetta, CNS
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-191207DOI: 10.1101/2021.02.08.430195OAI: oai:DiVA.org:su-191207DiVA, id: diva2:1536951
Available from: 2021-03-12 Created: 2021-03-12 Last updated: 2022-02-25Bibliographically approved
In thesis
1. Transmembrane Proteins and Protein Structure Prediction: What we can learn from Computational Methods
Open this publication in new window or tab >>Transmembrane Proteins and Protein Structure Prediction: What we can learn from Computational Methods
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

A protein’s 3D-structure is essential to understand how proteins function and interact and how biochemical processes proceed in organic life. Despite the advancement in experimental methods, it remains expensive and time-consuming to determine protein structure experimentally. There have been significant advances in machine learning and computational methods where, in many cases, models of protein structure can be determined to a high level of quality. Using computational methods helps predict protein 3D-structure and is often used complementary to experimental methods to give better insight and understanding of biological processes.

This thesis presents studies focusing on the simplicity and transparency of the 3D-structure pipeline. This is done with a new interactive database with full access to the pipeline’s data and code together with tools to analyse and compare models and structures. 

I present a new module for the last step in this pipeline, the final folding of the protein chain, which both simplifies the current pipeline and uses new input data based on the current research. This module predicts better models than its predecessor and produces models more than a magnitude faster than the current state-of-the-art tools. This module also contains a novel way of both folding and docking dimers in one single step. 

There are many examples of how machine learning models contain biases that originate in biased training data, translating into models that do not generalise well. I present a study where experts collaborate to create a high-quality database of Intrinsically Disordered Proteins. Through manual annotation and quality protocols, high-quality training data has been produced that is well suited for machine learning tasks and protein disorder analysis. In this thesis, I also present computational methods pertaining to transmembrane proteins and how they can increase our insight into membrane protein structure. In one study, we use computational methods together with experimental methods to investigate how differently charged residue pairs that form salt bridges inside the membrane of membrane proteins changes the insertion potential. We show that amino acid pairs that form salt bridges in this setting contribute 0.5-0.7 kcal/mol to membrane insertion’s apparent free energy. This gives new insight and advances in how we calculate insertion and can lead to better membrane protein topology predictors. In the final study, we investigate the CPA/AT-transporter family of transmembrane proteins and create a new integrated topology annotation method and structural classification, resulting in new insight into how this family evolved through time. The entire pipeline is published as an interactive database with complete transparency for both the method and data used. The study shows how this family has evolved by duplicating internal regions and how this has caused a structural symmetry in the family. 

This thesis, therefore, contributes to a more accessible and more transparent path of using computational methods to give a more extensive insight into protein structure prediction and how these structures pertain to biochemical processes.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2021. p. 57
Keywords
protein structure prediction, contact prediction, transmembrane protein, topology prediction, machine learning
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-191211 (URN)978-91-7911-456-5 (ISBN)978-91-7911-457-2 (ISBN)
Public defence
2021-04-30, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, and online via Zoom, public link is available at the department website, Stockholm, 10:00 (English)
Opponent
Supervisors
Available from: 2021-04-07 Created: 2021-03-12 Last updated: 2022-02-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Lamb, JohnElofsson, Arne

Search in DiVA

By author/editor
Lamb, JohnElofsson, Arne
By organisation
Department of Biochemistry and Biophysics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 66 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf