Change search
Link to record
Permanent link

Direct link
Shu, Nanjiang
Publications (10 of 17) Show all publications
Uziela, K., Menéndez Hurtado, D., Shu, N., Wallner, B. & Elofsson, A. (2018). Improved protein model quality assessments by changing the target function. Proteins: Structure, Function, and Bioinformatics, 86(6), 654-663
Open this publication in new window or tab >>Improved protein model quality assessments by changing the target function
Show others...
2018 (English)In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 86, no 6, p. 654-663Article in journal (Refereed) Published
Abstract [en]

Protein modeling quality is an important part of protein structure prediction. We have for more than a decade developed a set of methods for this problem. We have used various types of description of the protein and different machine learning methodologies. However, common to all these methods has been the target function used for training. The target function in ProQ describes the local quality of a residue in a protein model. In all versions of ProQ the target function has been the S-score. However, other quality estimation functions also exist, which can be divided into superposition- and contact-based methods. The superposition-based methods, such as S-score, are based on a rigid body superposition of a protein model and the native structure, while the contact-based methods compare the local environment of each residue. Here, we examine the effects of retraining our latest predictor, ProQ3D, using identical inputs but different target functions. We find that the contact-based methods are easier to predict and that predictors trained on these measures provide some advantages when it comes to identifying the best model. One possible reason for this is that contact based methods are better at estimating the quality of multi-domain targets. However, training on the S-score gives the best correlation with the GDT_TS score, which is commonly used in CASP to score the global model quality. To take the advantage of both of these features we provide an updated version of ProQ3D that predicts local and global model quality estimates based on different quality estimates.

Keywords
CASP, deep learning, estimation of model accuracy, model quality assessments, protein structure prediction
National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-156779 (URN)10.1002/prot.25492 (DOI)000431734800006 ()29524250 (PubMedID)
Available from: 2018-06-04 Created: 2018-06-04 Last updated: 2022-02-26Bibliographically approved
Salvatore, M., Shu, N. & Elofsson, A. (2018). The SubCons webserver: A user friendly web interface for state-of-the-art subcellular localization prediction. Protein Science, 27(1), 195-201
Open this publication in new window or tab >>The SubCons webserver: A user friendly web interface for state-of-the-art subcellular localization prediction
2018 (English)In: Protein Science, ISSN 0961-8368, E-ISSN 1469-896X, Vol. 27, no 1, p. 195-201Article in journal (Refereed) Published
Abstract [en]

SubCons is a recently developed method that predicts the subcellular localization of a protein. It combines predictions from four predictors using a Random Forest classifier. Here, we present the user-friendly web-interface implementation of SubCons. Starting from a protein sequence, the server rapidly predicts the subcellular localizations of an individual protein. In addition, the server accepts the submission of sets of proteins either by uploading the files or programmatically by using command line WSDL API scripts. This makes SubCons ideal for proteome wide analyses allowing the user to scan a whole proteome in few days. From the web page, it is also possible to download precalculated predictions for several eukaryotic organisms. To evaluate the performance of SubCons we present a benchmark of LocTree3 and SubCons using two recent mass-spectrometry based datasets of mouse and drosophila proteins. The server is available at http://subcons.bioinfo.se/

Keywords
subcellular localization, sequence analysis, machine learning
National Category
Biochemistry Molecular Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-152492 (URN)10.1002/pro.3297 (DOI)000418254300019 ()28901589 (PubMedID)
Available from: 2018-02-07 Created: 2018-02-07 Last updated: 2025-02-20Bibliographically approved
Tsirigos, K. D., Govindarajan, S., Bassot, C., Västermark, Å., Lamb, J., Shu, N. & Elofsson, A. (2018). Topology of membrane proteins - predictions, limitations and variations. Current opinion in structural biology, 50, 9-17
Open this publication in new window or tab >>Topology of membrane proteins - predictions, limitations and variations
Show others...
2018 (English)In: Current opinion in structural biology, ISSN 0959-440X, E-ISSN 1879-033X, Vol. 50, p. 9-17Article in journal (Refereed) Published
Abstract [en]

Transmembrane proteins perform a variety of important biological functions necessary for the survival and growth of the cells. Membrane proteins are built up by transmembrane segments that span the lipid bilayer. The segments can either be in the form of hydrophobic alpha-helices or beta-sheets which create a barrel. A fundamental aspect of the structure of transmembrane proteins is the membrane topology, that is, the number of transmembrane segments, their position in the protein sequence and their orientation in the membrane. Along these lines, many predictive algorithms for the prediction of the topology of alpha-helical and beta-barrel transmembrane proteins exist. The newest algorithms obtain an accuracy close to 80% both for alpha-helical and beta-barrel transmembrane proteins. However, lately it has been shown that the simplified picture presented when describing a protein family by its topology is limited. To demonstrate this, we highlight examples where the topology is either not conserved in a protein superfamily or where the structure cannot be described solely by the topology of a protein. The prediction of these nonstandard features from sequence alone was not successful until the recent revolutionary progress in 3D-structure prediction of proteins.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-160275 (URN)10.1016/j.sbi.2017.10.003 (DOI)000443661300004 ()29100082 (PubMedID)
Available from: 2018-09-18 Created: 2018-09-18 Last updated: 2022-02-26Bibliographically approved
Uziela, K., Menéndez Hurtado, D., Shu, N., Wallner, B. & Elofsson, A. (2017). ProQ3D: improved model quality assessments using deep learning. Bioinformatics, 33(10), 1578-1580
Open this publication in new window or tab >>ProQ3D: improved model quality assessments using deep learning
Show others...
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 10, p. 1578-1580Article in journal (Refereed) Published
Abstract [en]

Protein quality assessment is a long-standing problem in bioinformatics. For more than a decade we have developed state-of-art predictors by carefully selecting and optimising inputs to a machine learning method. The correlation has increased from 0.60 in ProQ to 0.81 in ProQ2 and 0.85 in ProQ3 mainly by adding a large set of carefully tuned descriptions of a protein. Here, we show that a substantial improvement can be obtained using exactly the same inputs as in ProQ2 or ProQ3 but replacing the support vector machine by a deep neural network. This improves the Pearson correlation to 0.90 (0.85 using ProQ2 input features).

Keywords
Model Quality Assessment, Protein Bioinformatics, Machine Learning, Deep Learning, Neural Networks, Multi Layer Perceptron, Deep neural networks
National Category
Bioinformatics and Computational Biology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-137679 (URN)10.1093/bioinformatics/btw819 (DOI)000402130700023 ()
Funder
Swedish Research Council, VR-NT 2012-5046Swedish Research Council, VR-NT 2012-5270Swedish e‐Science Research Center
Available from: 2017-01-09 Created: 2017-01-09 Last updated: 2025-02-07Bibliographically approved
Salvatore, M., Warholm, P., Shu, N., Basile, W. & Elofsson, A. (2017). SubCons: a new ensemble method for improved human subcellular localization predictions. Bioinformatics, 33(16), 2464-2470
Open this publication in new window or tab >>SubCons: a new ensemble method for improved human subcellular localization predictions
Show others...
2017 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 16, p. 2464-2470Article in journal (Refereed) Published
Abstract [en]

Motivation: Knowledge of the correct protein subcellular localization is necessary for understanding the function of a protein. Unfortunately large-scale experimental studies are limited in their accuracy. Therefore, the development of prediction methods has been limited by the amount of accurate experimental data. However, recently large-scale experimental studies have provided new data that can be used to evaluate the accuracy of subcellular predictions in human cells. Using this data we examined the performance of state of the art methods and developed SubCons, an ensemble method that combines four predictors using a Random Forest classifier. Results: SubCons outperforms earlier methods in a dataset of proteins where two independent methods confirm the subcellular localization. Given nine subcellular localizations, SubCons achieves an F1-Score of 0.79 compared to 0.70 of the second bestmethod. Furthermore, at a FPR of 1% the true positive rate (TPR) is over 58% for SubCons compared to less than 50% for the best individual predictor.

National Category
Biological Sciences Environmental Biotechnology
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-147084 (URN)10.1093/bioinformatics/btx219 (DOI)000407139800005 ()28407043 (PubMedID)
Available from: 2017-10-16 Created: 2017-10-16 Last updated: 2022-02-28Bibliographically approved
Peters, C., Tsirigos, K. D., Shu, N. & Elofsson, A. (2016). Improved topology prediction using the terminal hydrophobic helices rule. Bioinformatics, 32(8), 1158-1162
Open this publication in new window or tab >>Improved topology prediction using the terminal hydrophobic helices rule
2016 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 8, p. 1158-1162Article in journal (Refereed) Published
Abstract [en]

Motivation: The translocon recognizes sufficiently hydrophobic regions of a protein and inserts them into the membrane. Computational methods try to determine what hydrophobic regions are recognized by the translocon. Although these predictions are quite accurate, many methods still fail to distinguish marginally hydrophobic transmembrane (TM) helices and equally hydrophobic regions in soluble protein domains. In vivo, this problem is most likely avoided by targeting of the TM-proteins, so that non-TM proteins never see the translocon. Proteins are targeted to the translocon by an N-terminal signal peptide. The targeting is also aided by the fact that the N-terminal helix is more hydrophobic than other TM-helices. In addition, we also recently found that the C-terminal helix is more hydrophobic than central helices. This information has not been used in earlier topology predictors.

Results: Here, we use the fact that the N- and C-terminal helices are more hydrophobic to develop a new version of the first-principle-based topology predictor, SCAMPI. The new predictor has two main advantages; first, it can be used to efficiently separate membrane and non-membrane proteins directly without the use of an extra prefilter, and second it shows improved performance for predicting the topology of membrane proteins that contain large non-membrane domains.

Availability and implementation: The predictor, a web server and all datasets are available at http://scampi.bioinfo.se/.

National Category
Biological Sciences Bioinformatics (Computational Biology)
Research subject
Biochemistry; Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-129060 (URN)10.1093/bioinformatics/btv709 (DOI)000374476800006 ()
Available from: 2016-04-13 Created: 2016-04-13 Last updated: 2022-02-23Bibliographically approved
Hayat, S., Peters, C., Shu, N., Tsirigos, K. D. & Elofsson, A. (2016). Inclusion of dyad-repeat pattern improves topology prediction of transmembrane beta-barrel proteins. Bioinformatics, 32(10), 1571-1573
Open this publication in new window or tab >>Inclusion of dyad-repeat pattern improves topology prediction of transmembrane beta-barrel proteins
Show others...
2016 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 10, p. 1571-1573Article in journal (Refereed) Published
Abstract [en]

Accurate topology prediction of transmembrane beta-barrels is still an open question. Here, we present BOCTOPUS2, an improved topology prediction method for transmembrane beta-barrels that can also identify the barrel domain, predict the topology and identify the orientation of residues in transmembrane beta-strands. The major novelty of BOCTOPUS2 is the use of the dyad-repeat pattern of lipid and pore facing residues observed in transmembrane beta-barrels. In a cross-validation test on a benchmark set of 42 proteins, BOCTOPUS2 predicts the correct topology in 69% of the proteins, an improvement of more than 10% over the best earlier method (BOCTOPUS) and in addition, it produces significantly fewer erroneous predictions on non-transmembrane beta-barrel proteins.

National Category
Biochemistry Molecular Biology Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-131992 (URN)10.1093/bioinformatics/btw025 (DOI)000376656900022 ()26794316 (PubMedID)
Available from: 2016-08-15 Created: 2016-07-05 Last updated: 2025-02-20Bibliographically approved
Uziela, K., Shu, N., Wallner, B. & Elofsson, A. (2016). ProQ3: Improved model quality assessments using Rosetta energy terms. Scientific Reports, 6, Article ID 33509.
Open this publication in new window or tab >>ProQ3: Improved model quality assessments using Rosetta energy terms
2016 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 6, article id 33509Article in journal (Refereed) Published
Abstract [en]

Quality assessment of protein models using no other information than the structure of the model itself has been shown to be useful for structure prediction. Here, we introduce two novel methods, ProQRosFA and ProQRosCen, inspired by the state-of-art method ProQ2, but using a completely different description of a protein model. ProQ2 uses contacts and other features calculated from a model, while the new predictors are based on Rosetta energies: ProQRosFA uses the full-atom energy function that takes into account all atoms, while ProQRosCen uses the coarse-grained centroid energy function. The two new predictors also include residue conservation and terms corresponding to the agreement of a model with predicted secondary structure and surface area, as in ProQ2. We show that the performance of these predictors is on par with ProQ2 and significantly better than all other model quality assessment programs. Furthermore, we show that combining the input features from all three predictors, the resulting predictor ProQ3 performs better than any of the individual methods. ProQ3, ProQRosFA and ProQRosCen are freely available both as a webserver and stand-alone programs at http://proq3.bioinfo.se/.

National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-135223 (URN)10.1038/srep33509 (DOI)000384595800001 ()
Available from: 2016-11-14 Created: 2016-11-01 Last updated: 2022-09-15Bibliographically approved
Tsirigos, K. D., Peters, C., Shu, N., Käll, L. & Elofsson, A. (2015). The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides. Nucleic Acids Research, 43(W1), W401-W407
Open this publication in new window or tab >>The TOPCONS web server for consensus prediction of membrane protein topology and signal peptides
Show others...
2015 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 43, no W1, p. W401-W407Article in journal (Refereed) Published
Abstract [en]

TOPCONS (http://topcons.net/) is a widely used web server for consensus prediction of membrane protein topology. We hereby present a major update to the server, with some substantial improvements, including the following: (i) TOPCONS can now efficiently separate signal peptides from transmembrane regions. (ii) The server can now differentiate more successfully between globular and membrane proteins. (iii) The server now is even slightly faster, although a much larger database is used to generate the multiple sequence alignments. For most proteins, the final prediction is produced in a matter of seconds. (iv) The user-friendly interface is retained, with the additional feature of submitting batch files and accessing the server programmatically using standard interfaces, making it thus ideal for proteome-wide analyses. Indicatively, the user can now scan the entire human proteome in a few days. (v) For proteins with homology to a known 3D structure, the homology-inferred topology is also displayed. (vi) Finally, the combination of methods currently implemented achieves an overall increase in performance by 4% as compared to the currently available best-scoring methods and TOPCONS is the only method that can identify signal peptides and still maintain a state-of-the-art performance in topology predictions.

National Category
Biological Sciences
Research subject
Biochemistry; Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-120710 (URN)10.1093/nar/gkv485 (DOI)000359772700063 ()
Funder
Swedish Research Council
Available from: 2015-09-16 Created: 2015-09-15 Last updated: 2022-03-23Bibliographically approved
Virkki, M., Boekel, C., Illergård, K., Peters, C., Shu, N., Tsirigos, K. D., . . . Nilsson, I. (2014). Large Tilts in Transmembrane Helices Can Be Induced during Tertiary Structure Formation. Journal of Molecular Biology, 426(13), 2529-2538
Open this publication in new window or tab >>Large Tilts in Transmembrane Helices Can Be Induced during Tertiary Structure Formation
Show others...
2014 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 426, no 13, p. 2529-2538Article in journal (Refereed) Published
Abstract [en]

While early structural models of helix-bundle integral membrane proteins posited that the transmembrane a-helices [transmembrane helices (TMHs)] were orientated more or less perpendicular to the membrane plane, there is now ample evidence from high-resolution structures that many TMHs have significant tilt angles relative to the membrane. Here, we address the question whether the tilt is an intrinsic property of the TMH in question or if it is imparted on the TMH during folding of the protein. Using a glycosylation mapping technique, we show that four highly tilted helices found in multi-spanning membrane proteins all have much shorter membrane-embedded segments when inserted by themselves into the membrane than seen in the high-resolution structures. This suggests that tilting can be induced by tertiary packing interactions within the protein, subsequent to the initial membrane-insertion step.

Keywords
transnnembrane helix, membrane protein folding, translocon
National Category
Molecular Biology
Identifiers
urn:nbn:se:su:diva-106063 (URN)10.1016/j.jmb.2014.04.020 (DOI)000337780200009 ()
Note

AuthorCount:9;

Available from: 2014-07-31 Created: 2014-07-21 Last updated: 2026-03-06Bibliographically approved
Organisations

Search in DiVA

Show all publications