Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
High GC content causes orphan proteins to be intrinsically disordered
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Linköping University, Sweden.
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Kungliga Tekniska Högskolan, Sweden.ORCID iD: 0000-0002-7115-9751
Number of Authors: 42017 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, no 3, article id e1005375Article in journal (Refereed) Published
Abstract [en]

De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

Place, publisher, year, edition, pages
2017. Vol. 13, no 3, article id e1005375
National Category
Biological Sciences Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-142711DOI: 10.1371/journal.pcbi.1005375ISI: 000398031900014PubMedID: 28355220OAI: oai:DiVA.org:su-142711DiVA, id: diva2:1093169
Available from: 2017-05-05 Created: 2017-05-05 Last updated: 2023-10-19Bibliographically approved
In thesis
1. Orphan Genes Bioinformatics: Identification and properties of de novo created genes
Open this publication in new window or tab >>Orphan Genes Bioinformatics: Identification and properties of de novo created genes
2017 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Even today, many genes are without any known homolog. These "orphans" are found in all species, from Viruses to Prokaryotes and Eukaryotes. For a portion of these genes, we might simply not have enough data to find homologs yet. Some of them are imported from taxonomically distant organisms via lateral transfer; others have homologs, but mutated beyond the point of recognition.

However, a sizeable fraction of orphan genes is unambiguously created via "de novo" mechanisms. The study of such novel genes can contribute to our understanding of the emergence of functional novelty and the adaptation of species to new ecological niches.

In this work, we first survey the field of orphan studies, and illustrate some of the common issues. Next, we analyze some of the intrinsic properties of orphans proteins, including secondary structure elements and Intrinsic Structural Disorder; specifically, we observe that in young proteins the relationship between these properties and the G+C content of their coding sequence is stronger than in older proteins.

We then tackle some of the methodological problems often found in orphan studies. We find that using evolutionarily close species, and sensitive, state-of-the art homology recognition methods is instrumental to the identification of a set of orphans enriched in de novo created ones.

Finally, we compare how intrinsic disorder is distributed in bacteria versus eukaryota. Eukaryotic proteins are longer and more disordered; the difference is to be attributed primarily to eukaryotic-specific domains and linker regions. In these sections of the proteins, a higher frequency of the disorder-promoting amino acid Serine can be observed in Eukaryotes.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2017. p. 46
Keywords
bioinformatics, de novo, orphans, evolutionary genetics
National Category
Biological Sciences
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-149168 (URN)978-91-7797-085-9 (ISBN)978-91-7797-086-6 (ISBN)
Public defence
2018-01-12, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Submitted. Paper 4: Manuscript.

Available from: 2017-12-20 Created: 2017-11-20 Last updated: 2022-02-28Bibliographically approved
2. Intrinsic disorder and tandem repeats - match made in evolution: Computational studies of molecular evolution
Open this publication in new window or tab >>Intrinsic disorder and tandem repeats - match made in evolution: Computational studies of molecular evolution
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are both the building blocks and workers of the cell, carrying out most of the important functions. For a long time, their structure has been regarded as the primary factor for their function, but intrinsically disordered proteins demonstrate an alternative to this paradigm. Disordered proteins can temporarily assume different forms based on their interactions with other molecules and play critical roles in several biological processes, including cell signaling and regulation of gene expression.

Tandem repeats are repeated patterns in genetic sequence. The role of tandem repeats in many protein structures is well documented today, but their role in disordered proteins is not entirely clear. This thesis aims to shed light on the mechanisms by which protein disorder and tandem repeats are linked.

Only 2.5% of residues in all known protein sequences are characterized by the overlap of tandem repeats and protein disorder as described in Paper III, but many of these proteins have crucial functions and are linked to human diseases. Short tandem repeats emerge in this study as most frequently occurring in disordered regions. Genetic variation in disordered proteins accounts for length differences in eukaryotic genes (Paper I) and many orphan, recently evolved proteins, are disordered due to high GC content (Paper II). 

A medical application of this research is illustrated in the thesis with examples of variations in short tandem repeats (STRs) and their role in human diseases. Paper IV presents a comprehensive resource of human STR variation and Paper V illustrates how it can be used to identify specific STRs of interest, such as in the case of colorectal cancer where variations in certain STRs lead to altered gene expression patterns in tumors.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University, 2023. p. 49
Keywords
Protein evolution, intrinsically disordered proteins (IDPs), tandem repeats, short tandem repeats (STRs), genetic variation, orphan proteins, GC content, human STR variation, colorectal cancer, gene expression
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-223099 (URN)978-91-8014-559-6 (ISBN)978-91-8014-560-2 (ISBN)
Public defence
2023-12-11, hörsal 7, hus D, Universitetsvägen 10 D and online via Zoom, public link is available at the department website, Stockholm, 09:00 (English)
Opponent
Supervisors
Available from: 2023-11-16 Created: 2023-10-19 Last updated: 2023-12-06Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMed

Authority records

Basile, WalterSachenkova, OxanaLight, SaraElofsson, Arne

Search in DiVA

By author/editor
Basile, WalterSachenkova, OxanaLight, SaraElofsson, Arne
By organisation
Department of Biochemistry and BiophysicsScience for Life Laboratory (SciLifeLab)
In the same journal
PloS Computational Biology
Biological SciencesBioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 311 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf