RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Orphan Genes Bioinformatics: Identification and properties of de novo created genes
Stockholms universitet, Naturvetenskapliga fakulteten, Institutionen för biokemi och biofysik.
2017 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Even today, many genes are without any known homolog. These "orphans" are found in all species, from Viruses to Prokaryotes and Eukaryotes. For a portion of these genes, we might simply not have enough data to find homologs yet. Some of them are imported from taxonomically distant organisms via lateral transfer; others have homologs, but mutated beyond the point of recognition.

However, a sizeable fraction of orphan genes is unambiguously created via "de novo" mechanisms. The study of such novel genes can contribute to our understanding of the emergence of functional novelty and the adaptation of species to new ecological niches.

In this work, we first survey the field of orphan studies, and illustrate some of the common issues. Next, we analyze some of the intrinsic properties of orphans proteins, including secondary structure elements and Intrinsic Structural Disorder; specifically, we observe that in young proteins the relationship between these properties and the G+C content of their coding sequence is stronger than in older proteins.

We then tackle some of the methodological problems often found in orphan studies. We find that using evolutionarily close species, and sensitive, state-of-the art homology recognition methods is instrumental to the identification of a set of orphans enriched in de novo created ones.

Finally, we compare how intrinsic disorder is distributed in bacteria versus eukaryota. Eukaryotic proteins are longer and more disordered; the difference is to be attributed primarily to eukaryotic-specific domains and linker regions. In these sections of the proteins, a higher frequency of the disorder-promoting amino acid Serine can be observed in Eukaryotes.

sted, utgiver, år, opplag, sider
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2017.
Emneord [en]
bioinformatics, de novo, orphans, evolutionary genetics
HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
URN: urn:nbn:se:su:diva-149168ISBN: 978-91-7797-085-9 (tryckt)ISBN: 978-91-7797-086-6 (digital)OAI: oai:DiVA.org:su-149168DiVA, id: diva2:1158369
Disputas
2018-01-12, Magnélisalen, Kemiska övningslaboratoriet, Svante Arrhenius väg 16 B, Stockholm, 13:00 (engelsk)
Opponent
Veileder
Merknad

At the time of the doctoral defense, the following papers were unpublished and had a status as follows: Paper 3: Submitted. Paper 4: Manuscript.

Tilgjengelig fra: 2017-12-20 Laget: 2017-11-20 Sist oppdatert: 2017-12-20bibliografisk kontrollert
Delarbeid
1. Orphans and new gene origination, a structural and evolutionary perspective
Åpne denne publikasjonen i ny fane eller vindu >>Orphans and new gene origination, a structural and evolutionary perspective
2014 (engelsk)Inngår i: Current opinion in structural biology, ISSN 0959-440X, E-ISSN 1879-033X, Vol. 26, s. 73-83Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The frequency of de novo creation of proteins has been debated. Early it was assumed that de novo creation should be extremely rare and that the vast majority of all protein coding genes were created in early history of life. However, the early genomics era lead to the insight that protein coding genes do appear to be lineage-specific. Today, with thousands of completely sequenced genomes, this impression remains. It has even been proposed that the creation of novel genes, a continuous process where most de novo genes are short-lived, is as frequent as gene duplications. There exist reports with strongly indicative evidence for de novo gene emergence in many organisms ranging from Bacteria, sometimes generated through bacteriophages, to humans, where orphans appear to be overexpressed in brain and testis. In contrast, research on protein evolution indicates that many very distantly related proteins appear to share partial homology. Here, we discuss recent results on de novo gene emergence, as well as important technical challenges limiting our ability to get a definite answer to the extent of de novo protein creation.

HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-107638 (URN)10.1016/j.sbi.2014.05.006 (DOI)000340852000012 ()
Merknad

AuthorCount:3;

Tilgjengelig fra: 2014-09-22 Laget: 2014-09-22 Sist oppdatert: 2017-11-20bibliografisk kontrollert
2. High GC content causes orphan proteins to be intrinsically disordered
Åpne denne publikasjonen i ny fane eller vindu >>High GC content causes orphan proteins to be intrinsically disordered
2017 (engelsk)Inngår i: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, nr 3, artikkel-id e1005375Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-142711 (URN)10.1371/journal.pcbi.1005375 (DOI)000398031900014 ()28355220 (PubMedID)
Tilgjengelig fra: 2017-05-05 Laget: 2017-05-05 Sist oppdatert: 2018-01-13bibliografisk kontrollert
3. The classification of orphans is improved by combining searches in both proteomes and genomes
Åpne denne publikasjonen i ny fane eller vindu >>The classification of orphans is improved by combining searches in both proteomes and genomes
2017 (engelsk)Inngår i: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203Artikkel i tidsskrift (Fagfellevurdert) Submitted
Abstract [en]

The detection of genes without homologs (“orphans”) in other species is important, as it provides a glimpse on the evolutionary processes that create novel genes. However, for an unbiased view of such de novo gene creation the detection of these genes needs to be accurate. The estimation of the conservation, and in general the age determination of any gene, is dependent on two factors: (i) a method to detect homologs in a genome and (ii) a set of related genomes. Here, we set out to investigate how the detection of orphans is influenced be these factors. We show that when using multiple genomes and six-frame translations of complete genomes the number of orphans is significantly reduced, when compared with earlier studies. Given these premises we obtain a strict set of 34 orphan Saccharomyces cerevisiae genes, and show that the number of orphans in Drosophila melanogaster and Drosophila pseudoobscura can be reduced to only 30 and 17, respectively.

HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-149079 (URN)
Tilgjengelig fra: 2017-11-20 Laget: 2017-11-20 Sist oppdatert: 2017-11-29bibliografisk kontrollert
4. Difference in disorder between eukaryotes and prokaryotes is largely due to Serine in linker regions
Åpne denne publikasjonen i ny fane eller vindu >>Difference in disorder between eukaryotes and prokaryotes is largely due to Serine in linker regions
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

In this study we ask what are the molecular properties that make eukaryotic proteins more disordered than prokaryotic ones. First, we show that on average eukaryotic proteins contain more amino acids that are promoting disorder. In particular the fraction of Serine residues is close to 8% of all residues in eukaryotes and less than 6% in prokaryotes. Second, we show that domains unique to eukaryotes and linker regions in eukaryotes are both more disordered and more abundant than corresponding regions in prokaryotic proteins. Serine is an important residue for post-translational modification and regulatory mechanisms. Therefore, we conclude that it is not unlikely that both the need for regulation in a complex eukaryotic cell and the increased amount of longer multi-domain proteins contribute to the higher intrinsic structural disorder in eukaryotic proteins.

HSV kategori
Forskningsprogram
biokemi med inriktning mot bioinformatik
Identifikatorer
urn:nbn:se:su:diva-149167 (URN)
Tilgjengelig fra: 2017-11-20 Laget: 2017-11-20 Sist oppdatert: 2017-11-20bibliografisk kontrollert

Open Access i DiVA

fulltext(960 kB)48 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 960 kBChecksum SHA-512
846da634fc2309fe008ef438caecfaabb543bfac859ec6759d3acf7e70fd1c4ed3b1cd0d1148266d403de8931a71316a3e1fbe1d0d9ef37b7b58b75deff20341
Type fulltextMimetype application/pdf

Søk i DiVA

Av forfatter/redaktør
Basile, Walter
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 48 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 877 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf