Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Variation in length of proteins by repeats and disorder regions
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. (Arne Elofsson)
2013 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Protein-coding genes evolve together with their genome and acquire changes, some of which affect the length of their protein products. This explains why equivalent proteins from different species can exhibit length differences. Variation in length of proteins during evolution arguably presents a large number of possibilities for improvement and innovation of protein structure and function. In order to contribute to an increased understanding of this process, we have studied variation caused by tandem domain duplications and insertions or deletions of intrinsically disordered residues.

The study of two proteins, Nebulin and Filamin, together with a broader study of long repeat proteins (>10 domain repeats), began by confirming that tandem domains evolve by internal duplications. Next, we show that vertebrate Nebulins evolved by duplications of a seven-domain unit, yet the most recent duplications utilized different gene parts as duplication units. However, Filamin exhibits a checkered duplication pattern, indicating that duplications were followed by similarity erosions that were hindered at particular domains due to the presence of equivalent binding motifs. For long repeat proteins, we found that human segmental duplications are over-represented in long repeat genes. Additionally, domains that have formed long repeats achieved this primarily by duplications of two or more domains at a time.

The study of homologous protein pairs from the well-characterized eukaryotes nematode, fruit fly and several fungi, demonstrated a link between variation in length and variation in the number of intrinsically disordered residues. Next, insertions and deletions (indels) estimated from HMM-HMM pairwise alignments showed that disordered residues are clearly more frequent among indel than non-indel residues. Additionally, a study of raw length differences showed that more than half of the variation in fungi proteins is composed of disordered residues. Finally, a model of indels and their immediate surroundings suggested that disordered indels occur in already disordered regions rather than in ordered regions.

Place, publisher, year, edition, pages
Stockholm, Sweden: Department of Biochemistry and Biophysics, Stockholm University , 2013. , 32 p.
Keyword [en]
protein length, repeats, domain repeats, protein evolution, duplication, tandem duplication, intrinsic disorder, intrinsically disordered, variation in length, insertion, deletion, recombination, expansion, contraction
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry
Identifiers
URN: urn:nbn:se:su:diva-88553ISBN: 978-91-7447-670-5 (print)OAI: oai:DiVA.org:su-88553DiVA: diva2:612043
Public defence
2013-04-26, Högbomsalen, Geovetenskapens hus, Svante Arrhenius väg 12, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 2: In press. Paper 4: Manuscript.

Available from: 2013-04-04 Created: 2013-03-19 Last updated: 2013-03-29Bibliographically approved
List of papers
1. Nebulin: A Study of Protein Repeat Evolution
Open this publication in new window or tab >>Nebulin: A Study of Protein Repeat Evolution
2010 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 402, no 1, 38-51 p.Article in journal (Refereed) Published
Abstract [en]

Protein domain repeats are common in proteins that are central to the organization of a cell, in particular in eukaryotes. They are known to evolve through internal tandem duplications. However, the understanding of the underlying mechanisms is incomplete. To shed light on repeat expansion mechanisms, we have studied the evolution of the muscle protein Nebulin, a protein that contains a large number of actin-binding nebulin domains. Nebulin proteins have evolved from an invertebrate precursor containing two nebulin domains. Repeat regions have expanded through duplications of single domains, as well as duplications of a super repeat (SR) consisting of seven nebulins. We show that the SR has evolved independently into large regions in at least three instances: twice in the invertebrate Branchiostoma floridae and once in vertebrates. In-depth analysis reveals several recent tandem duplications in the Nebulin gene. The events involve both single-domain and multidomain SR units or several SR units. There are single events, but frequently the same unit is duplicated multiple times. For instance, an ancestor of human and chimpanzee underwent two tandem duplications. The duplication junction coincides with an Alu transposon, thus suggesting duplication through Alu-mediated homologous recombination. Duplications in the SR region consistently involve multiples of seven domains. However, the exact unit that is duplicated varies both between species and within species. Thus, multiple tandem duplications of the same motif did not create the large Nebulin protein. Finally, analysis of segmental duplications in the human genome reveals that duplications are more common in genes containing domain repeats than in those coding for nonrepeated proteins. In fact, segmental duplications are found three to six times more often in long repeated genes than expected by chance. 

Keyword
protein domain repeat, evolution, repeat duplication, segmental duplication, Nebulin
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry
Identifiers
urn:nbn:se:su:diva-50177 (URN)10.1016/j.jmb.2010.07.011 (DOI)000282074500005 ()20643138 (PubMedID)
Note

authorCount :4

Available from: 2010-12-29 Created: 2010-12-21 Last updated: 2017-12-11Bibliographically approved
2. Long indels are disordered: A study of disorder and indels in homologous eukaryotic proteins
Open this publication in new window or tab >>Long indels are disordered: A study of disorder and indels in homologous eukaryotic proteins
2013 (English)In: Biochimica et Biophysica Acta - Proteins and Proteomics, ISSN 1570-9639, E-ISSN 1878-1454, Vol. 1834, no 5, 890-897 p.Article in journal (Refereed) Published
Abstract [en]

Proteins evolve through point mutations as well as by insertions and deletions (indels). During the last decade it has become apparent that protein regions that do not fold into three-dimensional structures, i.e. intrinsically disordered regions, are quite common. Here, we have studied the relationship between protein disorder and indels using HMM-HMM pairwise alignments in two sets of orthologous eukaryotic protein pairs. First, we show that disordered residues are much more frequent among indel residues than among aligned residues and, also are more prevalent among indels than in coils. Second, we observed that disordered residues are particularly common in longer indels. Disordered indels of short-to-medium size are prevalent in the non-terminal regions of proteins while the longest indels, ordered and disordered alike, occur toward the termini of the proteins where new structural units are comparatively well tolerated. Finally, while disordered regions often evolve faster than ordered regions and disorder is common in indels, there are some previously recognized protein families where the disordered region is more conserved than the ordered region. We find that these rare proteins are often involved in information processes, such as RNA processing and translation. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.

Keyword
Intrinsically disordered, protein, Indel, Protein evolution, protein structure, Sequence alignment
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry
Identifiers
urn:nbn:se:su:diva-88549 (URN)10.1016/j.bbapap.2013.01.002 (DOI)000318388300009 ()
Projects
GeneFun Project, contract no: LSHG-CT-2004-503567EDICT project, contract no: FP7-HEALTH-F4-2007-201924
Funder
Swedish Research Council, VR-NT 2009-5072Swedish Research Council, VR-M 2010-3555EU, FP7, Seventh Framework Programme, 201924
Available from: 2013-03-19 Created: 2013-03-19 Last updated: 2017-12-06Bibliographically approved
3. The evolution of filamin - A protein domain repeat perspective
Open this publication in new window or tab >>The evolution of filamin - A protein domain repeat perspective
Show others...
2012 (English)In: Journal of Structural Biology, ISSN 1047-8477, E-ISSN 1095-8657, Vol. 179, no 3, 289-298 p.Article in journal (Refereed) Published
Abstract [en]

Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin beta 3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates.

Keyword
Filamin, Protein domain repeats, Integrin, Protein domain evolution, Aggregation, Tandem duplication
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry
Identifiers
urn:nbn:se:su:diva-81232 (URN)10.1016/j.jsb.2012.02.010 (DOI)000308268200006 ()
Note

AuthorCount:5;

Available from: 2012-10-18 Created: 2012-10-15 Last updated: 2017-12-07Bibliographically approved
4. Protein expansion is primarily due to indels in intrinsically disordered regions
Open this publication in new window or tab >>Protein expansion is primarily due to indels in intrinsically disordered regions
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Proteins evolve not only through point mutations but also by insertion and deletion events, which affect the length of the protein. It is well known that such indel events most frequently occur in surface exposed loops. However, detailed analysis of indel events in distantly related proteins is hampered by the difficulty involved in correctly aligning such sequences. Here, we circumvent this problem by analyzing homologous proteins based on length variation rather than pairwise alignments. We find a surprisingly strong relationship between difference in length and difference in the number of intrinsically disordered residues, where more than half of the length variation can be explained by changes in the number of intrinsically disordered residues. A more detailed analysis reveals that indel events do not induce disorder but rather that already disordered regions accrue indels, suggesting that there is a significantly lowered selective pressure for indels to occur within intrinsically disordered regions.

National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry
Identifiers
urn:nbn:se:su:diva-88551 (URN)
Projects
GeneFun project, contract No: LSHG-CT-2004-503567EDICT project, contract No: FP7-HEALTH-F4-2007-201924
Funder
Swedish Research Council, VRNT 2009-5072Swedish Research Council, VR-M 2010-3555Swedish Research Council, VR-NT 2012-5046EU, FP7, Seventh Framework Programme, 201924
Available from: 2013-03-19 Created: 2013-03-19 Last updated: 2014-11-10Bibliographically approved

Open Access in DiVA

fulltext(349 kB)451 downloads
File information
File name FULLTEXT02.pdfFile size 349 kBChecksum SHA-512
623e721e5155cc10db70f702c357db889a6a94fb1e347e73d1c7699d66ad28e59719b828a064c37ca657405cbd4588f7c441539e128aa46f52c32f4cd8b309b9
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Sagit, Rauan
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
Total: 451 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 303 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf