Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Protein Model Quality Assessment: A Machine Learning Approach
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik. (Arne Elofsson)ORCID-id: 0000-0003-2232-3006
2017 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Many protein structure prediction programs exist and they can efficiently generate a number of protein models of a varying quality. One of the problems is that it is difficult to know which model is the best one for a given target sequence. Selecting the best model is one of the major tasks of Model Quality Assessment Programs (MQAPs). These programs are able to predict model accuracy before the native structure is determined. The accuracy estimation can be divided into two parts: global (the whole model accuracy) and local (the accuracy of each residue). ProQ2 is one of the most successful MQAPs for prediction of both local and global model accuracy and is based on a Machine Learning approach.

In this thesis, I present my own contribution to Model Quality Assessment (MQA) and the newest developments of ProQ program series. Firstly, I describe a new ProQ2 implementation in the protein modelling software package Rosetta. This new implementation allows use of ProQ2 as a scoring function for conformational sampling inside Rosetta, which was not possible before. Moreover, I present two new methods, ProQ3 and ProQ3D that both outperform their predecessor. ProQ3 introduces new training features that are calculated from Rosetta energy functions and ProQ3D introduces a new machine learning approach based on deep learning. ProQ3 program participated in the 12th Community Wide Experiment on the Critical Assessment of Techniques for Protein Structure Prediction (CASP12) and was one of the best methods in the MQA category. Finally, an important issue in model quality assessment is how to select a target function that the predictor is trying to learn. In the fourth manuscript, I show that MQA results can be improved by selecting a contact-based target function instead of more conventional superposition based functions.

sted, utgiver, år, opplag, sider
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2017. , s. 46
Emneord [en]
Protein Model Quality Assessment, structural bioinformatics, machine learning, deep learning, support vector machine, proq, Artificial Neural Network, protein structure prediction
HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
URN: urn:nbn:se:su:diva-137695ISBN: 978-91-7649-633-6 (tryckt)ISBN: 978-91-7649-634-3 (tryckt)OAI: oai:DiVA.org:su-137695DiVA, id: diva2:1063493
Disputas
2017-02-10, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 14:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
Swedish Research Council, VR-NT 2012-5046
Merknad

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 3: Manuscript.

Tilgjengelig fra: 2017-01-18 Laget: 2017-01-10 Sist oppdatert: 2025-02-07bibliografisk kontrollert
Delarbeid
1. ProQ2: estimation of model accuracy implemented in Rosetta
Åpne denne publikasjonen i ny fane eller vindu >>ProQ2: estimation of model accuracy implemented in Rosetta
2016 (engelsk)Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, nr 9, s. 1411-1413Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Motivation: Model quality assessment programs are used to predict the quality of modeled protein structures. They can be divided into two groups depending on the information they are using: ensemble methods using consensus of many alternative models and methods only using a single model to do its prediction. The consensus methods excel in achieving high correlations between prediction and true quality measures. However, they frequently fail to pick out the best possible model, nor can they be used to generate and score new structures. Single-model methods on the other hand do not have these inherent shortcomings and can be used both to sample new structures and to improve existing consensus methods. Results: Here, we present an implementation of the ProQ2 program to estimate both local and global model accuracy as part of the Rosetta modeling suite. The current implementation does not only make it possible to run large batch runs locally, but it also opens up a whole new arena for conformational sampling using machine learned scoring functions and to incorporate model accuracy estimation in to various existing modeling schemes. ProQ2 participated in CASP11 and results from CASP11 are used to benchmark the current implementation. Based on results from CASP11 and CAMEO-QE, a continuous benchmark of quality estimation methods, it is clear that ProQ2 is the single-model method that performs best in both local and global model accuracy.

HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-131551 (URN)10.1093/bioinformatics/btv767 (DOI)000376106100020 ()26733453 (PubMedID)
Tilgjengelig fra: 2016-06-30 Laget: 2016-06-21 Sist oppdatert: 2022-03-23bibliografisk kontrollert
2. ProQ3: Improved model quality assessments using Rosetta energy terms
Åpne denne publikasjonen i ny fane eller vindu >>ProQ3: Improved model quality assessments using Rosetta energy terms
2016 (engelsk)Inngår i: Scientific Reports, E-ISSN 2045-2322, Vol. 6, artikkel-id 33509Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Quality assessment of protein models using no other information than the structure of the model itself has been shown to be useful for structure prediction. Here, we introduce two novel methods, ProQRosFA and ProQRosCen, inspired by the state-of-art method ProQ2, but using a completely different description of a protein model. ProQ2 uses contacts and other features calculated from a model, while the new predictors are based on Rosetta energies: ProQRosFA uses the full-atom energy function that takes into account all atoms, while ProQRosCen uses the coarse-grained centroid energy function. The two new predictors also include residue conservation and terms corresponding to the agreement of a model with predicted secondary structure and surface area, as in ProQ2. We show that the performance of these predictors is on par with ProQ2 and significantly better than all other model quality assessment programs. Furthermore, we show that combining the input features from all three predictors, the resulting predictor ProQ3 performs better than any of the individual methods. ProQ3, ProQRosFA and ProQRosCen are freely available both as a webserver and stand-alone programs at http://proq3.bioinfo.se/.

HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-135223 (URN)10.1038/srep33509 (DOI)000384595800001 ()
Tilgjengelig fra: 2016-11-14 Laget: 2016-11-01 Sist oppdatert: 2022-09-15bibliografisk kontrollert
3. Improved protein model quality assessments by changing the target function
Åpne denne publikasjonen i ny fane eller vindu >>Improved protein model quality assessments by changing the target function
Vise andre…
2018 (engelsk)Inngår i: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 86, nr 6, s. 654-663Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates.

Emneord
CASP, deep learning, estimation of model accuracy, model quality assessments, protein structure prediction
HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-156779 (URN)10.1002/prot.25492 (DOI)000431734800006 ()29524250 (PubMedID)
Tilgjengelig fra: 2018-06-04 Laget: 2018-06-04 Sist oppdatert: 2022-02-26bibliografisk kontrollert
4. ProQ3D: improved model quality assessments using deep learning
Åpne denne publikasjonen i ny fane eller vindu >>ProQ3D: improved model quality assessments using deep learning
Vise andre…
2017 (engelsk)Inngår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, nr 10, s. 1578-1580Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network. This improves the Pearson correlation to 0.90 (0.85 using ProQ2 input features).

Emneord
Model Quality Assessment, Protein Bioinformatics, Machine Learning, Deep Learning, Neural Networks, Multi Layer Perceptron, Deep neural networks
HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-137679 (URN)10.1093/bioinformatics/btw819 (DOI)000402130700023 ()
Forskningsfinansiär
Swedish Research Council, VR-NT 2012-5046Swedish Research Council, VR-NT 2012-5270Swedish e‐Science Research Center
Tilgjengelig fra: 2017-01-09 Laget: 2017-01-09 Sist oppdatert: 2025-02-07bibliografisk kontrollert

Open Access i DiVA

Protein Model Quality Assessment(872 kB)834 nedlastinger
Filinformasjon
Fil FULLTEXT02.pdfFilstørrelse 872 kBChecksum SHA-512
66ffbc06c3437fab0a715019d34d21d979a24777a7a0434d2a0752230dc07d5d0ec9001464f406918c37f2fc24e76c465073eb598de39546ea73c058e86a2865
Type fulltextMimetype application/pdf

Person

Uziela, Karolis

Søk i DiVA

Av forfatter/redaktør
Uziela, Karolis
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 839 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 2408 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf