Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A novel training procedure to train deep networks in the assessment of the quality of protein models
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).ORCID iD: 0000-0003-3534-2986
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).ORCID iD: 0000-0002-7115-9751
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Motivation: Proteins fold into complex structures that are crucial for their biological functions. Experimental determination of protein structures iscostly and therefore limited to a small fraction of all known proteins. Hence,different computational structure prediction methods are necessary for themodelling of the vast majority of all proteins. In most structure predictionpipelines, the last step is to select the best available model and to estimateits accuracy. This model quality estimation problem has been growing inimportance during the last decade, and progress is believed to be importantfor large scale modelling of proteins. Current machine learning models trained to estimate the protein modelquality suffer from biases in the training set: multiple models of only a fewtargets, generated by a few methods.

Results: We propose a new methodology to train deep networks that leveragesthe structure of the problem and takes advantage of some of this redundan-cies. We demonstrate its viability by reaching results comparable with anotherstate-of-the-art method, ProQ3D, trained and evaluated on the same datasets,but employing only a small subset of the input features.The proposed training strategy is applicable to other input features anddatasets, and thus can be applied to other programs.

Availability: The code is freely available for download at: github.com/ElofssonLab/ProQ4 and runs with minimal requirements: requires only one multiplesequence alignment and a collection of models and depends only on Python3, hdf5, a deep learning framework compatible with Keras, and dssp.Contact: arne@bioinfo.se

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-172393OAI: oai:DiVA.org:su-172393DiVA, id: diva2:1346724
Available from: 2019-08-28 Created: 2019-08-28 Last updated: 2019-12-12Bibliographically approved
In thesis
1. Structured Learning for Structural Bioinformatics: Applications of Deep Learning to Protein Structure Prediction
Open this publication in new window or tab >>Structured Learning for Structural Bioinformatics: Applications of Deep Learning to Protein Structure Prediction
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are the basic molecular machines of the cell, performing a broad range of tasks, from structural support to catalysisof chemical reactions. Their function is determined by their 3D structure, which in turn is dictated by the order of their components, the amino acids.

This thesis is dedicated to applications of machine learning to the problems of contact prediction, ab-initio, and model quality assessment. In particular, my research has been focused on developing methods that are both effective, and easy to use.

In the first paper, we improved the already state-of-the-art model quality assessment (MQA) program ProQ3 replacing the underlying machine learning algorithm from svm to Deep Learning, baptised ProQ3D. The correlation between predicted and true scores was improved from 0.85 to 0.90, using the same training data and features.

The second paper joined several programs into a single pipeline for ab-initio structure prediction: contact prediction,folding, and model selection. We attempted to predict the structures of all 6379 PFAM families with unknown structure, ofwhich 558 we believe to be accurate. Of these, 415 had not been reported before.

The third paper uses advances in machine learning to build a contact predictor, PconsC4, that is fast and easy to deployin large-scale studies, since it requires a single Multiple Sequence Alignment (MSA), and no external dependencies. The predictions are state-of-the-art, yielding a 12% improvement in precision over PconsC3, and 244 times faster.

With ProQ4, in the fourth paper, we introduce a novel way of training deep networks for MQA in a way that minimises the bias of the training data, and emphasises model ranking, and demonstrate its viability with a minimal description ofthe protein. The ranking correlation was improved with respect to ProQ3D from 0.82 to 0.90.

Lastly, in the fifth paper, weshow the results of ProQ3D and ProQ4 in a completely blind test: CASP13.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2019. p. 63
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-172395 (URN)978-91-7797-797-1 (ISBN)978-91-7797-798-8 (ISBN)
Public defence
2019-10-11, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Manuscript.

Available from: 2019-09-18 Created: 2019-08-28 Last updated: 2019-09-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Search in DiVA

By author/editor
Menéndez Hurtado, DavidUziela, KarolisElofsson, Arne
By organisation
Department of Biochemistry and BiophysicsScience for Life Laboratory (SciLifeLab)
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 152 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf