1314151617181916 of 31
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Intrinsic disorder and tandem repeats - match made in evolution: Computational studies of molecular evolution
Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Proteins are both the building blocks and workers of the cell, carrying out most of the important functions. For a long time, their structure has been regarded as the primary factor for their function, but intrinsically disordered proteins demonstrate an alternative to this paradigm. Disordered proteins can temporarily assume different forms based on their interactions with other molecules and play critical roles in several biological processes, including cell signaling and regulation of gene expression.

Tandem repeats are repeated patterns in genetic sequence. The role of tandem repeats in many protein structures is well documented today, but their role in disordered proteins is not entirely clear. This thesis aims to shed light on the mechanisms by which protein disorder and tandem repeats are linked.

Only 2.5% of residues in all known protein sequences are characterized by the overlap of tandem repeats and protein disorder as described in Paper III, but many of these proteins have crucial functions and are linked to human diseases. Short tandem repeats emerge in this study as most frequently occurring in disordered regions. Genetic variation in disordered proteins accounts for length differences in eukaryotic genes (Paper I) and many orphan, recently evolved proteins, are disordered due to high GC content (Paper II). 

A medical application of this research is illustrated in the thesis with examples of variations in short tandem repeats (STRs) and their role in human diseases. Paper IV presents a comprehensive resource of human STR variation and Paper V illustrates how it can be used to identify specific STRs of interest, such as in the case of colorectal cancer where variations in certain STRs lead to altered gene expression patterns in tumors.

Place, publisher, year, edition, pages
Stockholm: Department of Biochemistry and Biophysics, Stockholm University , 2023. , p. 49
Keywords [en]
Protein evolution, intrinsically disordered proteins (IDPs), tandem repeats, short tandem repeats (STRs), genetic variation, orphan proteins, GC content, human STR variation, colorectal cancer, gene expression
National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
URN: urn:nbn:se:su:diva-223099ISBN: 978-91-8014-559-6 (print)ISBN: 978-91-8014-560-2 (electronic)OAI: oai:DiVA.org:su-223099DiVA, id: diva2:1806027
Public defence
2023-12-11, G-salen, Arrheniuslaboratorierna hus F, Svante Arrhenius väg 20 C and online via Zoom, public link is available at the department website, Stockholm, 09:00 (English)
Opponent
Supervisors
Available from: 2023-11-16 Created: 2023-10-19 Last updated: 2023-11-07Bibliographically approved
List of papers
1. Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions
Open this publication in new window or tab >>Protein Expansion Is Primarily due to Indels in Intrinsically Disordered Regions
Show others...
2013 (English)In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 30, no 12, p. 2645-2653Article in journal (Refereed) Published
Abstract [en]

Proteins evolve not only through point mutations but also by insertion and deletion events, which affect the length of the protein. It is well known that such indel events most frequently occur in surface-exposed loops. However, detailed analysis of indel events in distantly related and fast-evolving proteins is hampered by the difficulty involved in correctly aligning such sequences. Here, we circumvent this problem by first only analyzing homologous proteins based on length variation rather than pairwise alignments. Using this approach, we find a surprisingly strong relationship between difference in length and difference in the number of intrinsically disordered residues, where up to three quarters of the length variation can be explained by changes in the number of intrinsically disordered residues. Further, we find that disorder is common in both insertions and deletions. A more detailed analysis reveals that indel events do not induce disorder but rather that already disordered regions accrue indels, suggesting that there is a lowered selective pressure for indels to occur within intrinsically disordered regions.

Keywords
disordered proteins, insertions and deletions, indels, protein evolution, protein structure
National Category
Biochemistry and Molecular Biology
Identifiers
urn:nbn:se:su:diva-98080 (URN)10.1093/molbev/mst157 (DOI)000327793000010 ()
Funder
Swedish Research Council, VR-NT 2009-5072Swedish Research Council, VR-M 2010-3555Swedish Research Council, VR-NT 2012-5046
Note

AuthorCount:5;

Funding agencies:

SSF; Foundation for Strategic Research, Science for Life Laboratory; EU LSHG-CT-2004-503567 FP7-HEALTH-F4-2007-201924 

 

Available from: 2013-12-27 Created: 2013-12-27 Last updated: 2023-10-19Bibliographically approved
2. High GC content causes orphan proteins to be intrinsically disordered
Open this publication in new window or tab >>High GC content causes orphan proteins to be intrinsically disordered
2017 (English)In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, no 3, article id e1005375Article in journal (Refereed) Published
Abstract [en]

De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

National Category
Biological Sciences Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-142711 (URN)10.1371/journal.pcbi.1005375 (DOI)000398031900014 ()28355220 (PubMedID)
Available from: 2017-05-05 Created: 2017-05-05 Last updated: 2023-10-19Bibliographically approved
3. A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder
Open this publication in new window or tab >>A New Census of Protein Tandem Repeats and Their Relationship with Intrinsic Disorder
Show others...
2020 (English)In: Genes, ISSN 2073-4425, E-ISSN 2073-4425, Vol. 11, no 4, article id 407Article in journal (Refereed) Published
Abstract [en]

Protein tandem repeats (TRs) are often associated with immunity-related functions and diseases. Since that last census of protein TRs in 1999, the number of curated proteins increased more than seven-fold and new TR prediction methods were published. TRs appear to be enriched with intrinsic disorder and vice versa. The significance and the biological reasons for this association are unknown. Here, we characterize protein TRs across all kingdoms of life and their overlap with intrinsic disorder in unprecedented detail. Using state-of-the-art prediction methods, we estimate that 50.9% of proteins contain at least one TR, often located at the sequence flanks. Positive linear correlation between the proportion of TRs and the protein length was observed universally, with Eukaryotes in general having more TRs, but when the difference in length is taken into account the difference is quite small. TRs were enriched with disorder-promoting amino acids and were inside intrinsically disordered regions. Many such TRs were homorepeats. Our results support that TRs mostly originate by duplication and are involved in essential functions such as transcription processes, structural organization, electron transport and iron-binding. In viruses, TRs are found in proteins essential for virulence.

Keywords
tandem repeat, homorepeat, domain repeat, protein repeat, repeat prediction, intrinsic disorder, protein function, Swiss-Prot
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-183161 (URN)10.3390/genes11040407 (DOI)000537224600101 ()32283633 (PubMedID)
Available from: 2020-07-01 Created: 2020-07-01 Last updated: 2023-10-20Bibliographically approved
4. WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans
Open this publication in new window or tab >>WebSTR: A Population-wide Database of Short Tandem Repeat Variation in Humans
Show others...
2023 (English)In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 435, no 20, article id 168260Article in journal (Refereed) Published
Abstract [en]

Short tandem repeats (STRs) are consecutive repetitions of one to six nucleotide motifs. They are hypervariable due to the high prevalence of repeat unit insertions or deletions primarily caused by polymerase slippage during replication. Genetic variation at STRs has been shown to influence a range of traits in humans, including gene expression, cancer risk, and autism. Until recently STRs have been poorly studied since they pose significant challenges to bioinformatics analyses. Moreover, genome-wide analysis of STR variation in population-scale cohorts requires large amounts of data and computational resources. However, the recent advent of genome-wide analysis tools has resulted in multiple large genome-wide datasets of STR variation spanning nearly two million genomic loci in thousands of individuals from diverse populations.

Here we present WebSTR, a database of genetic variation and other characteristics of genome-wide STRs across human populations. WebSTR is based on reference panels of more than 1.7 million human STRs created with state of the art repeat annotation methods and can easily be extended to include additional cohorts or species. It currently contains data based on STR genotypes for individuals from the 1000 Genomes Project, H3Africa, the Genotype-Tissue Expression (GTEx) Project and colorectal cancer patients from the TCGA dataset.

WebSTR is implemented as a relational database with programmatic access available through an API and a web portal for browsing data. The web portal is publicly available at https://webstr.ucsd.edu.

National Category
Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics; Genetics
Identifiers
urn:nbn:se:su:diva-223097 (URN)10.1016/j.jmb.2023.168260 (DOI)37678708 (PubMedID)2-s2.0-85171645901 (Scopus ID)
Available from: 2023-10-19 Created: 2023-10-19 Last updated: 2023-11-21Bibliographically approved
5. Short tandem repeat mutations regulate gene expression in colorectal cancer
Open this publication in new window or tab >>Short tandem repeat mutations regulate gene expression in colorectal cancer
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Short tandem repeat (STR) mutations are prevalent in colorectal cancer (CRC), especially in tumours with the microsatellite instability (MSI) phenotype. While STR length variations are known to regulate gene expression under physiological conditions, the functional impact of STR mutations in CRC remains unclear. Here, we integrate STR mutation data with clinical information and gene expression levels to study the gene regulatory effects of STR mutations in CRC. We confirm that STR mutability in CRC highly depends on the MSI status, repeat unit size, and repeat length. Furthermore, we present a set of 1244 putative expression STRs (eSTRs) for which the STR length is associated with gene expression levels in CRC tumours. The length of 73 eSTRs is associated with expression levels of cancer-related genes, nine of which are CRC-specific genes. We show that linear models describing eSTR-gene expression relationships allow for predictions of gene expression changes in response to eSTR mutations. Moreover, we found an increased mutability of eSTRs in MSI tumours. Our evidence of gene regulatory roles for eSTRs in CRC highlights a mostly overlooked way through which tumours may modulate their phenotypes. The increased mutability of eSTRs in MSI tumours may be an early indication that eSTR mutations can confer a selective advantage to tumours. Future extensions of our findings into larger cohorts could uncover new STR-based targets in the treatment of cancer.

National Category
Medical Genetics Bioinformatics (Computational Biology)
Research subject
Biochemistry towards Bioinformatics
Identifiers
urn:nbn:se:su:diva-223098 (URN)
Available from: 2023-10-19 Created: 2023-10-19 Last updated: 2023-11-27

Open Access in DiVA

Intrinsic disorder and tandem repeats - match made in evolution: Computational studies of molecular evolution(8370 kB)21 downloads
File information
File name FULLTEXT01.pdfFile size 8370 kBChecksum SHA-512
949102663738a8b5a3536688c8a243e11b9e9eb7960df55f67280b08c2772b3317eed5d5981d4f938de88f210b714e1351f6bdd5e6ea468ac5095f316d3f3e99
Type fulltextMimetype application/pdf

Authority records

Lundström, Oxana

Search in DiVA

By author/editor
Lundström, Oxana
By organisation
Department of Biochemistry and Biophysics
Bioinformatics (Computational Biology)

Search outside of DiVA

GoogleGoogle Scholar
Total: 21 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 395 hits
1314151617181916 of 31
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf