Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Creation of new proteins - domain rearrangements and tandem duplications
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. (Arne Elofsson)
2010 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are modular entities with domains as their building blocks. The domains are recurrent protein fragments with a distinct structure, function and evolutionary history. During evolution, proteins with new functions have been invented through rearrangements as well as differentiation of domains. The focus of this thesis is to gain better understanding of the processes that govern domain rearrangements. In particular, the rearrangements that create long protein domain repeats have been investigated in detail.

We estimate that about 65% of the eukaryotic and 40% of the prokaryotic proteins are of the multidomain type. Further, we find that the eukaryotic multidomain proteins are mainly created through insertion of a single domain at the N- or C-terminus. However, domain repeats differ from other domain rearrangements in the aspect that they are created from internal tandem duplications. We show that such duplications often involve several domains simultaneously, and that different repeated domain families show distinct evolutionary patterns. Finally, we have investigated how large repeat regions are created using a specific example; the Actin binding nebulin domain. The analysis reveals several tandem duplications of both single nebulin domains and super repeats of seven nebulins in a number of vertebrates. We see that the duplication breakpoints vary between the species and that multiple duplications of the same region are common.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2010. , 58 p.
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry
Identifiers
URN: urn:nbn:se:su:diva-37906ISBN: 978-91-7447-032-1 (print)OAI: oai:DiVA.org:su-37906DiVA: diva2:305436
Public defence
2010-04-23, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 4: Manuscript.

Available from: 2010-03-30 Created: 2010-03-23 Last updated: 2014-11-10Bibliographically approved
List of papers
1. Multi-domain Proteins in the Three Kingdoms of Life: Orphan Domains and Other Unassigned Regions
Open this publication in new window or tab >>Multi-domain Proteins in the Three Kingdoms of Life: Orphan Domains and Other Unassigned Regions
2005 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 348, no 1, 241-243 p.Article in journal (Refereed) Published
Abstract [en]

Comparative studies of the proteomes from different organisms have provided valuable information about protein domain distribution in the kingdoms of life. Earlier studies have been limited by the fact that only about 50% of the proteomes could be matched to a domain. Here, we have extended these studies by including less well-defined domain definitions, Pfam-B and clustered domains, MAS, in addition to Pfam-A and SCOP domains. It was found that a significant fraction of these domain families are homologous to Pfam-A or SCOP domains. Further, we show that all regions that do not match a Pfam-A or SCOP domain contain a significantly higher fraction of disordered structure. These unstructured regions may be contained within orphan domains or function as linkers between structured domains. Using several different definitions we have re-estimated the number of multi-domain proteins in different organisms and found that several methods all predict that eukaryotes have approximately 65% multi-domain proteins, while the prokaryotes consist of approximately 40% multi-domain proteins. However, these numbers are strongly dependent on the exact choice of cut-off for domains in unassigned regions. In conclusion, all eukaryotes have similar fractions of multidomain proteins and disorder, whereas a high fraction of repeating domain is distinguished only in multicellular eukaryotes. This implies a role for repeats in cell-cell contacts while the other two features are important for intracellular functions.

Keyword
protein domains; multi-domain protein; comparative genomics; kingdoms of life; proteome
Identifiers
urn:nbn:se:su:diva-25575 (URN)10.1016/j.jmb.2005.02.007 (DOI)
Note
Part of urn:nbn:se:su:diva-8295Available from: 2008-11-06 Created: 2008-10-27 Last updated: 2017-12-13Bibliographically approved
2. Domain Rearrangements in Protein Evolution
Open this publication in new window or tab >>Domain Rearrangements in Protein Evolution
Show others...
2005 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 353, no 4, 911-923 p.Article in journal (Refereed) Published
Abstract [en]

Most eukaryotic proteins are multi-domain proteins that are created from fusions of genes, deletions and internal repetitions. An investigation of such evolutionary events requires a method to find the domain architecture from which each protein originates. Therefore, we defined a novel measure, domain distance, which is calculated as the number of domains that differ between two domain architectures. Using this measure the evolutionary events that distinguish a protein from its closest ancestor have been studied and it was found that indels are more common than internal repetition and that the exchange of a domain is rare. Indels and repetitions are common at both the N and C-terminals while they are rare between domains. The evolution of the majority of multi-domain proteins can be explained by the stepwise insertions of single domains, with the exception of repeats that sometimes are duplicated several domains in tandem. We show that domain distances agree with sequence similarity and semantic similarity based on gene ontology annotations. In addition, we demonstrate the use of the domain distance measure to build evolutionary trees. Finally, the evolution of multi-domain proteins is exemplified by a closer study of the evolution of two protein families, non-receptor tyrosine kinases and RhoGEFs.

Keyword
protein evolution; multi-domain proteins; proteome; GOGraph; Pfam
Identifiers
urn:nbn:se:su:diva-25576 (URN)10.1016/j.jmb.2005.08.067 (DOI)
Note
Part of urn:nbn:se:su:diva-8295Available from: 2008-11-06 Created: 2008-10-27 Last updated: 2017-12-13Bibliographically approved
3. Expansion of Protein Domain Repeats
Open this publication in new window or tab >>Expansion of Protein Domain Repeats
2006 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 2, no 8, 959-970 p.Article in journal (Refereed) Published
Abstract [en]

Many proteins, especially in eukaryotes, contain tandem repeats of several domains from the same family. These repeats have a variety of binding properties and are involved in protein-protein interactions as well as binding to other ligands such as DNA and RNA. The rapid expansion of protein domain repeats is assumed to have evolved through internal tandem duplications. However, the exact mechanisms behind these tandem duplications are not well-understood. Here, we have studied the evolution, function, protein structure, gene structure, and phylogenetic distribution of domain repeats. For this purpose we have assigned Pfam-A domain families to 24 proteomes with more sensitive domain assignments in the repeat regions. These assignments confirmed previous findings that eukaryotes, and in particular vertebrates, contain a much higher fraction of proteins with repeats compared with prokaryotes. The internal sequence similarity in each protein revealed that the domain repeats are often expanded through duplications of several domains at a time, while the duplication of one domain is less common. Many of the repeats appear to have been duplicated in the middle of the repeat region. This is in strong contrast to the evolution of other proteins that mainly works through additions of single domains at either terminus. Further, we found that some domain families show distinct duplication patterns, e. g., nebulin domains have mainly been expanded with a unit of seven domains at a time, while duplications of other domain families involve varying numbers of domains. Finally, no common mechanism for the expansion of all repeats could be detected. We found that the duplication patterns show no dependence on the size of the domains. Further, repeat expansion in some families can possibly be explained by shuffling of exons. However, exon shuffling could not have created all repeats.

National Category
Medical Biotechnology (with a focus on Cell Biology (including Stem Cell Biology), Molecular Biology, Microbiology, Biochemistry or Biopharmacy)
Identifiers
urn:nbn:se:su:diva-25577 (URN)10.1371/journal.pcbi.0020114 (DOI)
Note

Part of urn:nbn:se:su:diva-8295

Available from: 2008-11-06 Created: 2008-10-27 Last updated: 2017-12-13Bibliographically approved
4. Nebulin: A Study of Protein Repeat Evolution
Open this publication in new window or tab >>Nebulin: A Study of Protein Repeat Evolution
2010 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 402, no 1, 38-51 p.Article in journal (Refereed) Published
Abstract [en]

Protein domain repeats are common in proteins that are central to the organization of a cell, in particular in eukaryotes. They are known to evolve through internal tandem duplications. However, the understanding of the underlying mechanisms is incomplete. To shed light on repeat expansion mechanisms, we have studied the evolution of the muscle protein Nebulin, a protein that contains a large number of actin-binding nebulin domains. Nebulin proteins have evolved from an invertebrate precursor containing two nebulin domains. Repeat regions have expanded through duplications of single domains, as well as duplications of a super repeat (SR) consisting of seven nebulins. We show that the SR has evolved independently into large regions in at least three instances: twice in the invertebrate Branchiostoma floridae and once in vertebrates. In-depth analysis reveals several recent tandem duplications in the Nebulin gene. The events involve both single-domain and multidomain SR units or several SR units. There are single events, but frequently the same unit is duplicated multiple times. For instance, an ancestor of human and chimpanzee underwent two tandem duplications. The duplication junction coincides with an Alu transposon, thus suggesting duplication through Alu-mediated homologous recombination. Duplications in the SR region consistently involve multiples of seven domains. However, the exact unit that is duplicated varies both between species and within species. Thus, multiple tandem duplications of the same motif did not create the large Nebulin protein. Finally, analysis of segmental duplications in the human genome reveals that duplications are more common in genes containing domain repeats than in those coding for nonrepeated proteins. In fact, segmental duplications are found three to six times more often in long repeated genes than expected by chance. 

Keyword
protein domain repeat, evolution, repeat duplication, segmental duplication, Nebulin
National Category
Bioinformatics and Systems Biology
Research subject
Biochemistry
Identifiers
urn:nbn:se:su:diva-50177 (URN)10.1016/j.jmb.2010.07.011 (DOI)000282074500005 ()20643138 (PubMedID)
Note

authorCount :4

Available from: 2010-12-29 Created: 2010-12-21 Last updated: 2017-12-11Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Björklund, Åsa
By organisation
Department of Biochemistry and Biophysics
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 213 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf