Prediction of zinc-binding sites in proteins and efficient protein structure description and comparison
2008 (English)Licentiate thesis, comprehensive summary (Other academic)
A large number of proteins require certain metals to stabilize their structures or to function properly. About one third of all proteins in the Protein Data Bank (PDB) contain metals and it is estimated that approximately the same proportion of all proteins are metalloproteins.
Zinc, the second most abundant transition metal found in eukaryotic organisms, plays key roles, mainly structural and catalytic, in many biological functions. Predicting whether a protein binds zinc and even the accurate location of binding sites is important when investigating the function of an experimentally uncharacterized protein.
Describing and comparing protein structures with both efficiency and accuracy are essential for systematic annotation of functional properties of proteins, be it on an individual or on a genome scale. Dozens of structure comparison methods have been developed in the past decades. In recent years, several research groups have endeavoured in developing methods for fast comparison of protein structures by representing the three-dimensional (3D) protein structures as one-dimensional (1D) geometrical strings based on the shape symbols of clustered regions of φ/ψ torsion angle pairs of the polypeptide backbones. These 1D geometrical strings, shape strings, are as compact as 1D secondary structures but carry more elaborate structural information in loop regions and thus are more suitable for fast structure database searching, classification of loop regions and evaluation of model structures.
In this thesis, a new method for predicting zinc-binding sites in proteins from amino acid sequences is described. This method predicts zinc-binding Cys, His, Asp and Glu (the four most common zinc-binding residues) with 75% precision (86% for Cys and His only) at 50% recall according to a solid 5-fold cross-validation on a non-redundant set of the PDB chains containing 2727 unique chains, of which 235 bind to zinc. This method predicts zinc-binding Cys and His with about 10% higher precision at different recall levels compared to a previously published method. In addition, different methods for describing and comparing protein structures are reviewed. Some recently developed methods based on 1D geometrical representation of backbone structures are emphasized and analyzed in details.
Place, publisher, year, edition, pages
2008. , 42 p.
zinc-binding, shape strings, protein structures, secondary structures, machine learning
Biochemistry and Molecular Biology Biochemistry and Molecular Biology Bioinformatics and Systems Biology Structural Biology
Research subject Biochemistry; Structural Biology; Molecular Biology
IdentifiersURN: urn:nbn:se:su:diva-32783OAI: oai:DiVA.org:su-32783DiVA: diva2:281539
2008-04-18, Magnélisalen, kemiska övningslaboratoriet, Svante Arrhenius väg 12, Frescati, Magnélisalen, 10:00 (English)
Sauer-Eriksson, Elisabeth, ProfessorLuybartsev, Alexander, Professor
Hovmöller, Sven, Professor
Projectsprotein structure prediction