A Novel Method for Accurate One-dimensional Protein Structure Prediction Based on Fragment Matching
2010 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 26, no 4, 470-477 p.Article in journal (Refereed) Published
Motivation: The precise prediction of one-dimensional (1D) protein structure as represented by the protein secondary structure and 1D string of discrete state of dihedral angles (i.e. Shape Strings) is a prerequisite for the successful prediction of three-dimensional (3D) structure as well as protein-protein interaction. We have developed a novel 1D structure prediction method, called Frag1D, based on a straightforward fragment matching algorithm and demonstrated its success in the prediction of three sets of 1D structural alphabets, i.e. the classical three-state secondary structure, three-state Shape Strings and eight-state Shape Strings.
Results: By exploiting the vast protein sequence and protein structure data available, we have brought secondary structure prediction closer to the expected theoretical limit. When tested by a leave-one-out cross validation on a non-redundant set of PDB cutting at 30% sequence identity containing 5860 protein chains, the overall per-residue accuracy for secondary structure prediction, i.e. Q3 is 82.9%. The overall per-residue accuracy for three-state and eight-state Shape Strings are 85.1% and 71.5% respectively. We have also benchmarked our program with the latest version of PSIPRED for secondary structure prediction and our program predicted 0.3% better in Q3 when tested on 2241 chains with the same training set. For Shape Strings, we compared our method with a recently published method with the same dataset and definition as used by that method. Our program predicted at 2.2% better in accuracy for three-state Shape Strings. By quantitatively investigating the effect of data base size on 1D structure prediction we show that the accuracy increases by about 1% with every doubling of the database size.
Place, publisher, year, edition, pages
2010. Vol. 26, no 4, 470-477 p.
protein secondary structure, shape strings, profile-profile, fragment matching
Biochemistry and Molecular Biology Structural Biology Bioinformatics (Computational Biology)
Research subject Biochemistry; Molecular Biology
IdentifiersURN: urn:nbn:se:su:diva-32781DOI: 10.1093/bioinformatics/btp679OAI: oai:DiVA.org:su-32781DiVA: diva2:281530
Projectsprotein structure prediction