Change search
Link to record
Permanent link

Direct link
Menéndez Hurtado, DavidORCID iD iconorcid.org/0000-0003-3534-2986
Alternative names
Publications (10 of 11) Show all publications
Baldassarre, F., Menéndez Hurtado, D., Elofsson, A. & Azizpour, H. (2021). GraphQA: protein model quality assessment using graph convolutional networks. Bioinformatics, 37(3), 360-366
Open this publication in new window or tab >>GraphQA: protein model quality assessment using graph convolutional networks
2021 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 37, no 3, p. 360-366Article in journal (Refereed) Published
Abstract [en]

Motivation: Proteins are ubiquitous molecules whose function in biological processes is determined by their 3D structure. Experimental identification of a protein's structure can be time-consuming, prohibitively expensive and not always possible. Alternatively, protein folding can be modeled using computational methods, which however are not guaranteed to always produce optimal results. GraphQA is a graph-based method to estimate the quality of protein models, that possesses favorable properties such as representation learning, explicit modeling of both sequential and 3D structure, geometric invariance and computational efficiency.

Results: GraphQA performs similarly to state-of-the-art methods despite using a relatively low number of input features. In addition, the graph network structure provides an improvement over the architecture used in ProQ4 operating on the same input features. Finally, the individual contributions of GraphQA components are carefully evaluated.

National Category
Biological Sciences Computer and Information Sciences
Identifiers
urn:nbn:se:su:diva-196420 (URN)10.1093/bioinformatics/btaa714 (DOI)000667755400010 ()32780838 (PubMedID)
Available from: 2021-09-08 Created: 2021-09-08 Last updated: 2022-02-25Bibliographically approved
Cheng, J., Choe, M., Elofsson, A., Han, K.-S., Hou, J., Maghrabi, A. H. A., . . . Wallner, B. (2019). Estimation of model accuracy in CASP13. Proteins: Structure, Function, and Bioinformatics, 87(12), 1361-1377
Open this publication in new window or tab >>Estimation of model accuracy in CASP13
Show others...
2019 (English)In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 87, no 12, p. 1361-1377Article in journal (Refereed) Published
Abstract [en]

Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue‐residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus‐based methods.

National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-172394 (URN)10.1002/prot.25767 (DOI)000476102200001 ()
Available from: 2019-08-28 Created: 2019-08-28 Last updated: 2025-02-07Bibliographically approved
Michel, M., Menéndez Hurtado, D. & Elofsson, A. (2019). PconsC4: fast, accurate and hassle-free contact predictions. Bioinformatics, 35(15), 2677-2679
Open this publication in new window or tab >>PconsC4: fast, accurate and hassle-free contact predictions
2019 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 15, p. 2677-2679Article in journal (Refereed) Published
Abstract [en]

Motivation

Residue contact prediction was revolutionized recently by the introduction of direct coupling analysis (DCA). Further improvements, in particular for small families, have been obtained by the combination of DCA and deep learning methods. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive.

Results

Here, we introduce a novel contact predictor, PconsC4, which performs on par with state of the art methods. PconsC4 is heavily optimized, does not use any external programs and therefore is significantly faster and easier to use than other methods.

Availability and implementation

PconsC4 is freely available under the GPL license from https://github.com/ElofssonLab/PconsC4. Installation is easy using the pip command and works on any system with Python 3.5 or later and a GCC compiler. It does not require a GPU nor special hardware.

Supplementary information

Supplementary data are available at Bioinformatics online.

National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-172392 (URN)10.1093/bioinformatics/bty1036 (DOI)000484378200024 ()
Available from: 2019-08-28 Created: 2019-08-28 Last updated: 2025-02-07Bibliographically approved
Lamb, J., Jarmolinska, A., Michel, M., Menéndez-Hurtado, D., Sulkowska, J. & Elofsson, A. (2019). PconsFam: An Interactive Database of Structure Predictions of Pfam Families. Journal of Molecular Biology, 431(13), 2442-2448
Open this publication in new window or tab >>PconsFam: An Interactive Database of Structure Predictions of Pfam Families
Show others...
2019 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 431, no 13, p. 2442-2448Article in journal (Refereed) Published
Abstract [en]

At present, about half of the protein domain families lack a structural representative. However, in the last decade, predicting contact maps and using these to model the tertiary structure for these protein families have become an alternative approach to gain structural insight. At present, reliable models for several hundreds of protein families have been created using this approach. To increase the use of this approach, we present PconsFam, which is an intuitive and interactive database for predicted contact maps and tertiary structure models of the entire Pfam database. By modeling all possible families, both with and without a representative structure, using the PconsFold2 pipeline, and running quality assessment estimator on the models, we predict an estimation for how confident the contact maps and structures are for each family.

Keywords
contact maps, structure prediction, folding pipeline, Pfam
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-172037 (URN)10.1016/j.jmb.2019.01.047 (DOI)000474675300006 ()30796988 (PubMedID)
Available from: 2019-08-27 Created: 2019-08-27 Last updated: 2022-02-26Bibliographically approved
Menéndez Hurtado, D. (2019). Structured Learning for Structural Bioinformatics: Applications of Deep Learning to Protein Structure Prediction. (Doctoral dissertation). Stockholm: Department of Biochemistry and Biophysics, Stockholm University
Open this publication in new window or tab >>Structured Learning for Structural Bioinformatics: Applications of Deep Learning to Protein Structure Prediction
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are the basic molecular machines of the cell, performing a broad range of tasks, from structural support to catalysisof chemical reactions. Their function is determined by their 3D structure, which in turn is dictated by the order of their components, the amino acids.

This thesis is dedicated to applications of machine learning to the problems of contact prediction, ab-initio, and model quality assessment. In particular, my research has been focused on developing methods that are both effective, and easy to use.

In the first paper, we improved the already state-of-the-art model quality assessment (MQA) program ProQ3 replacing the underlying machine learning algorithm from svm to Deep Learning, baptised ProQ3D. The correlation between predicted and true scores was improved from 0.85 to 0.90, using the same training data and features.

The second paper joined several programs into a single pipeline for ab-initio structure prediction: contact prediction,folding, and model selection. We attempted to predict the structures of all 6379 PFAM families with unknown structure, ofwhich 558 we believe to be accurate. Of these, 415 had not been reported before.

The third paper uses advances in machine learning to build a contact predictor, PconsC4, that is fast and easy to deployin large-scale studies, since it requires a single Multiple Sequence Alignment (MSA), and no external dependencies. The predictions are state-of-the-art, yielding a 12% improvement in precision over PconsC3, and 244 times faster.

With ProQ4, in the fourth paper, we introduce a novel way of training deep networks for MQA in a way that minimises the bias of the training data, and emphasises model ranking, and demonstrate its viability with a minimal description ofthe protein. The ranking correlation was improved with respect to ProQ3D from 0.82 to 0.90.

Lastly, in the fifth paper, weshow the results of ProQ3D and ProQ4 in a completely blind test: CASP13.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2019. p. 63
National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-172395 (URN)978-91-7797-797-1 (ISBN)978-91-7797-798-8 (ISBN)
Public defence
2019-10-11, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following paper was unpublished and had a status as follows: Paper 4: Manuscript.

Available from: 2019-09-18 Created: 2019-08-28 Last updated: 2025-02-07Bibliographically approved
Uziela, K., Menéndez Hurtado, D., Shu, N., Wallner, B. & Elofsson, A. (2018). Improved protein model quality assessments by changing the target function. Proteins: Structure, Function, and Bioinformatics, 86(6), 654-663
Open this publication in new window or tab >>Improved protein model quality assessments by changing the target function
Show others...
2018 (English)In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 86, no 6, p. 654-663Article in journal (Refereed) Published
Abstract [en]

Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates.

Keywords
CASP, deep learning, estimation of model accuracy, model quality assessments, protein structure prediction
National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-156779 (URN)10.1002/prot.25492 (DOI)000431734800006 ()29524250 (PubMedID)
Available from: 2018-06-04 Created: 2018-06-04 Last updated: 2022-02-26Bibliographically approved
Elofsson, A., Joo, K., Keasar, C., Lee, J., Maghrabi, A. H. A., Manavalan, B., . . . Wallner, B. (2018). Methods for estimation of model accuracy in CASP12. Proteins: Structure, Function, and Bioinformatics, 86(S1), 361-373
Open this publication in new window or tab >>Methods for estimation of model accuracy in CASP12
Show others...
2018 (English)In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 86, no S1, p. 361-373Article in journal (Refereed) Published
Abstract [en]

Methods to reliably estimate the quality of 3D models of proteins are essential drivers for the wide adoption and serious acceptance of protein structure predictions by life scientists. In this article, the most successful groups in CASP12 describe their latest methods for estimates of model accuracy (EMA). We show that pure single model accuracy estimation methods have shown clear progress since CASP11; the 3 top methods (MESHI, ProQ3, SVMQA) all perform better than the top method of CASP11 (ProQ2). Although the pure single model accuracy estimation methods outperform quasi-single (ModFOLD6 variations) and consensus methods (Pcons, ModFOLDclust2, Pcomb-domain, and Wallner) in model selection, they are still not as good as those methods in absolute model quality estimation and predictions of local quality. Finally, we show that when using contact-based model quality measures (CAD, lDDT) the single model quality methods perform relatively better.

Keywords
CASP, consensus predictions, estimates of model accuracy, machine learning, protein structure prediction, quality assessment
National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-154838 (URN)10.1002/prot.25395 (DOI)000425523000031 ()28975666 (PubMedID)
Available from: 2018-04-10 Created: 2018-04-10 Last updated: 2022-02-26Bibliographically approved
Michel, M., Menéndez Hurtado, D., Uziela, K. & Elofsson, A. (2017). Large-scale structure prediction by improved contact predictions and model quality assessment. Bioinformatics, 33(14), 123-129
Open this publication in new window or tab >>Large-scale structure prediction by improved contact predictions and model quality assessment
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 14, p. 123-129Article in journal (Refereed) Published
Abstract [en]

Motivation: Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. Results: We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these 415 have not been reported before. Availability: Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net. All programs used here are freely available.

National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-141945 (URN)10.1093/bioinformatics/btx239 (DOI)000405289100005 ()
Available from: 2017-04-21 Created: 2017-04-21 Last updated: 2022-03-23Bibliographically approved
Michel, M., Skwark, M. J., Menéndez Hurtado, D., Ekeberg, M. & Elofsson, A. (2017). Predicting accurate contacts in thousands of Pfam domain families using PconsC3. Bioinformatics, 33(18), 2859-2866
Open this publication in new window or tab >>Predicting accurate contacts in thousands of Pfam domain families using PconsC3
Show others...
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 18, p. 2859-2866Article in journal (Refereed) Published
Abstract [en]

Motivation: A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most a few hundred members, i.e. are too small for such contact prediction methods. Results: To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts. Availability and implementation: PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top-ranked contacts predicted correctly. Contact: arne@bioinfo.se Supplementary information: Supplementary data are available at Bioinformatics online.

National Category
Biological Sciences Environmental Biotechnology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-147917 (URN)10.1093/bioinformatics/btx332 (DOI)000409541400009 ()28535189 (PubMedID)
Available from: 2017-10-17 Created: 2017-10-17 Last updated: 2022-02-28Bibliographically approved
Uziela, K., Menéndez Hurtado, D., Shu, N., Wallner, B. & Elofsson, A. (2017). ProQ3D: improved model quality assessments using deep learning. Bioinformatics, 33(10), 1578-1580
Open this publication in new window or tab >>ProQ3D: improved model quality assessments using deep learning
Show others...
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 10, p. 1578-1580Article in journal (Refereed) Published
Abstract [en]

Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network. This improves the Pearson correlation to 0.90 (0.85 using ProQ2 input features).

Keywords
Model Quality Assessment, Protein Bioinformatics, Machine Learning, Deep Learning, Neural Networks, Multi Layer Perceptron, Deep neural networks
National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-137679 (URN)10.1093/bioinformatics/btw819 (DOI)000402130700023 ()
Funder
Swedish Research Council, VR-NT 2012-5046Swedish Research Council, VR-NT 2012-5270Swedish e‐Science Research Center
Available from: 2017-01-09 Created: 2017-01-09 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3534-2986

Search in DiVA

Show all publications