Change search
Refine search result
12 1 - 50 of 78
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dmitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Comparative interactomics with Funcoup 2.02012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no D1, p. D821-D828Article in journal (Refereed)
    Abstract [en]

    FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.

  • 2. Berglund, Ann-Charlotte
    et al.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Östlund, Gabriel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    InParanoid 6: eukaryotic ortholog clusters with inparalogs2008In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 36, p. D263-D266Article in journal (Refereed)
    Abstract [en]

    The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.

  • 3.
    Björklund, Åsa
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Creation of new proteins - domain rearrangements and tandem duplications2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Proteins are modular entities with domains as their building blocks. The domains are recurrent protein fragments with a distinct structure, function and evolutionary history. During evolution, proteins with new functions have been invented through rearrangements as well as differentiation of domains. The focus of this thesis is to gain better understanding of the processes that govern domain rearrangements. In particular, the rearrangements that create long protein domain repeats have been investigated in detail.

    We estimate that about 65% of the eukaryotic and 40% of the prokaryotic proteins are of the multidomain type. Further, we find that the eukaryotic multidomain proteins are mainly created through insertion of a single domain at the N- or C-terminus. However, domain repeats differ from other domain rearrangements in the aspect that they are created from internal tandem duplications. We show that such duplications often involve several domains simultaneously, and that different repeated domain families show distinct evolutionary patterns. Finally, we have investigated how large repeat regions are created using a specific example; the Actin binding nebulin domain. The analysis reveals several tandem duplications of both single nebulin domains and super repeats of seven nebulins in a number of vertebrates. We see that the duplication breakpoints vary between the species and that multiple duplications of the same region are common.

  • 4.
    Björklund, Åsa K.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Light, Sara
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sagit, Rauan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Nebulin: A Study of Protein Repeat Evolution2010In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 402, no 1, p. 38-51Article in journal (Refereed)
    Abstract [en]

    Protein domain repeats are common in proteins that are central to the organization of a cell, in particular in eukaryotes. They are known to evolve through internal tandem duplications. However, the understanding of the underlying mechanisms is incomplete. To shed light on repeat expansion mechanisms, we have studied the evolution of the muscle protein Nebulin, a protein that contains a large number of actin-binding nebulin domains. Nebulin proteins have evolved from an invertebrate precursor containing two nebulin domains. Repeat regions have expanded through duplications of single domains, as well as duplications of a super repeat (SR) consisting of seven nebulins. We show that the SR has evolved independently into large regions in at least three instances: twice in the invertebrate Branchiostoma floridae and once in vertebrates. In-depth analysis reveals several recent tandem duplications in the Nebulin gene. The events involve both single-domain and multidomain SR units or several SR units. There are single events, but frequently the same unit is duplicated multiple times. For instance, an ancestor of human and chimpanzee underwent two tandem duplications. The duplication junction coincides with an Alu transposon, thus suggesting duplication through Alu-mediated homologous recombination. Duplications in the SR region consistently involve multiples of seven domains. However, the exact unit that is duplicated varies both between species and within species. Thus, multiple tandem duplications of the same motif did not create the large Nebulin protein. Finally, analysis of segmental duplications in the human genome reveals that duplications are more common in genes containing domain repeats than in those coding for nonrepeated proteins. In fact, segmental duplications are found three to six times more often in long repeated genes than expected by chance. 

  • 5.
    Celorio-Mancera, Maria de la Paz
    et al.
    Max Planck Society.
    Heckel, David G.
    Vogel, Heiko
    Transcriptional analysis of physiological pathways in a generalist herbivore: responses to different host plants and plant structures by the cotton bollworm, Helicoverpa armigera2012In: Entomologia Experimentalis et Applicata, ISSN 0013-8703, E-ISSN 1570-7458, Vol. 144, no 1, p. 123-133Article in journal (Refereed)
    Abstract [en]

    The generalist cotton bollworm, Helicoverpa armigera (Hübner) (Lepidoptera: Noctuidae), can consume host plants in more than 40 families, and often utilizes several tissues of a single plant. It is believed that generalists owe their success to the deployment of various members of multigene families of detoxification and digestive enzymes, a strategy that may also be responsible for rapid evolution of insecticide resistance. However, studies of generalist adaptations have been limited to specific genes or gene families, and an overview of how these adaptations are orchestrated at the transcriptional level is lacking. We used Drosophila melanogaster Meigen gene homology to H. armigera-expressed sequence tags to identify key groups of genes and pathways differentially regulated in the gut of fifth instars after 2 days of feeding on a variety of food sources. A series of microarray hybridizations was performed following two alternating loop designs, one comparing the gut gene expression upon feeding on various hosts (cotton, bean, tobacco, and chickpea) and two artificial diets (pinto bean and wheat germ-based), whereas the second design compared the gut expression toward feeding on various plant structures within cotton (leaf, square, and boll). The transcriptional responses toward bean and tobacco feeding treatments were more closely related in comparison with the rest of the diets, whereas the gene expression profiles toward cotton leaf and square-feeding were highly similar. We furthermore found significant changes in several pathways not directly responsible for detoxification mechanisms. Genes involved in primary and secondary metabolism, environmental information processing, and cellular processes were found to be differentially expressed. In addition, regulation of xenobiotic metabolism and the extracellular matrix-receptor pathways appeared differentially regulated across feeding treatments. Three cytochrome P450 genes – CYP6AE17, CYP6B6, and CYP9A17 – grouped as part of a xenobiotic metabolism pathway, were up-regulated in the bean-feeding treatment, and down-regulated in both tobacco and cotton-feeding treatments. CYP4L11, CYP4L5, and CYP4S13 were differentially expressed upon feeding on different cotton plant structures. The present work provides host plant and plant structure-specific transcriptional responses in a lepidopteran herbivore, including pathways and gene candidates for future studies of H. armigera physiology under a more integrative ecologically meaningful framework.

  • 6. Cheng, Jianlin
    et al.
    Choe, Myong‐Ho
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Han, Kun-Sop
    Hou, Jie
    Maghrabi, Ali H. A.
    McGuffin, Liam J.
    Menéndez-Hurtado, David
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Olechnovič, Kliment
    Schwede, Torsten
    Studer, Gabriel
    Uziela, Karolis
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Venclovas, Česlovas
    Wallner, Björn
    Estimation of model accuracy in CASP132019In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134Article in journal (Refereed)
    Abstract [en]

    Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue‐residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus‐based methods.

  • 7. Chicharro, Daniel
    et al.
    Ledberg, Anders
    Stockholm University, Faculty of Social Sciences, Centre for Social Research on Alcohol and Drugs (SoRAD).
    Framework to study dynamic dependencies in networks of interacting processes2012In: Physical Review E. Statistical, Nonlinear, and Soft Matter Physics, ISSN 1539-3755, E-ISSN 1550-2376, Vol. 86, no 4Article in journal (Refereed)
    Abstract [en]

    The analysis of dynamic dependencies in complex systems such as the brain helps to understand how emerging properties arise from interactions. Here we propose an information-theoretic framework to analyze the dynamic dependencies in multivariate time-evolving systems. This framework constitutes a fully multivariate extension and unification of previous approaches based on bivariate or conditional mutual information and Granger causality or transfer entropy. We define multi-information measures that allow us to study the global statistical structure of the system as a whole, the total dependence between subsystems, and the temporal statistical structure of each subsystem. We develop a stationary and a nonstationary formulation of the framework. We then examine different decompositions of these multi-information measures. The transfer entropy naturally appears as a term in some of these decompositions. This allows us to examine its properties not as an isolated measure of interdependence but in the context of the complete framework. More generally we use causal graphs to study the specificity and sensitivity of all the measures appearing in these decompositions to different sources of statistical dependence arising from the causal connections between the subsystems. We illustrate that there is no straightforward relation between the strength of specific connections and specific terms in the decompositions. Furthermore, causal and noncausal statistical dependencies are not separable. In particular, the transfer entropy can be nonmonotonic in dependence on the connectivity strength between subsystems and is also sensitive to internal changes of the subsystems, so it should not be interpreted as a measure of connectivity strength. Altogether, in comparison to an analysis based on single isolated measures of interdependence, this framework is more powerful to analyze emergent properties in multivariate systems and to characterize functionally relevant changes in the dynamics.

  • 8.
    Colding, Johan
    Stockholm University.
    Local institutions, biological conservation and management of ecosystem dynamics2001Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis analyze local institutions and management practices related to natural resources and ecosystem dynamics, with an emphasis on "traditional ecological knowledge" systems. Papers I, II and III analyze ‘resource and habitat taboos’ (RHTs) with the objective to synthesize knowledge about informal institutions behind resource management. Papers IV and V focus on resource management practices and social mechanisms with a capacity to confer resilience in ecosystems. Ecological resilience is the buffering capacity of ecosystems to incorporate disturbance and yet continue to provide biodiversity and ecological services critical to societal development. Cases for the synthesis were mainly derived from the literature. Examples of RHTs could be grouped in six different categories depending on their potential management and conservation functions. These included both use-taboos and non-use taboos. The former regulates access to, and methods and withdrawal of subsistence resources. These appear to be closely related to traditional ecological knowledge, as it is defined in this thesis. The latter prohibits human use of species and habitats, and is closely related to religious and cosmological belief systems. As discussed, both groups of taboos can be comparable to ethics of academic conservation biology, although rationales behind such ethics differ. RHTs have effects that may contribute to the conservation of habitats, local subsistence resources, and ‘threatened’, ‘endemic’ and ‘keystone’ species, although some may run contrary to conservation and notions of sustainability. It is asserted that under certain circumstances, RHTs, and possibly other types of informal institutions may offer advantages relative to formal measures of conservation. These benefits include non-costly, voluntary compliance features. Results of papers IV and V revealed that there exists a diversity of traditional practices for ecosystem management. These include multiple species management, resource rotation, ecological monitoring, succession management, landscape patchiness management, and practices of responding to and managing pulses and ecological surprises. Social mechanisms behind these practices included a number of adaptations for the generation, accumulation, and transmission of knowledge; dynamics of institutions; mechanisms for cultural internalization of traditional practices; and the development of appropriate world views and cultural values. These traditional systems had certain similarities to adaptive management with its emphasis on feedback learning, and its treatment of uncertainty and unpredictability to ecosystems. Furthermore, there existed practices that seem to reduce social-ecological crises in the events of large-scale natural disturbance. These included practices that create small-scale ecosystem renewal cycles, practices that spread risks, and practices for nurturing sources of ecosystem renewal. These practices are linked to social mechanisms such as flexible user rights and land tenure. It is concluded that ecological monitoring appears to be a key element in the development of many of the practices. Management practices in local communities are framed by a social context, with informal institutions and other social mechanisms, and supported by a worldview that does not de-couple people from their dependence on natural systems. Since management of ecosystems is associated with uncertainty about their spatial and temporal dynamics and due to incomplete knowledge about such dynamics, these practices may provide useful ‘rules of thumb’ for resource management with an ability to confer resilience and tighten environmental feedbacks of resource exploitation to local levels. To link local institutions in cross-scale polycentric co-management arrangements may be a viable option for improving current resource management systems.

  • 9. Corcoran, Martin M.
    et al.
    Phad, Ganesh E.
    Bernat, Nestor Vazquez
    Stahl-Hennig, Christiane
    Sumida, Noriyuki
    Persson, Mats A. A.
    Martin, Marcel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Hedestam, Gunilla B. Karlsson
    Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity2016In: nature communications, ISSN 2041-1723, Vol. 7, article id 13642Article in journal (Refereed)
    Abstract [en]

    Comprehensive knowledge of immunoglobulin genetics is required to advance our understanding of B cell biology. Validated immunoglobulin variable (V) gene databases are close to completion only for human and mouse. We present a novel computational approach, IgDiscover, that identifies germline V genes from expressed repertoires to a specificity of 100%. IgDiscover uses a cluster identification process to produce candidate sequences that, once filtered, results in individualized germline V gene databases. IgDiscover was tested in multiple species, validated by genomic cloning and cross library comparisons and produces comprehensive gene databases even where limited genomic sequence is available. IgDiscover analysis of the allelic content of the Indian and Chinese-origin rhesus macaques reveals high levels of immunoglobulin gene diversity in this species. Further, we describe a novel human IGHV3-21 allele and confirm significant gene differences between Balb/c and C57BL6 mouse strains, demonstrating the power of IgDiscover as a germline V gene discovery tool.

  • 10.
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    The relationship between orthology, protein domain architecture and protein function2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Lacking experimental data, protein function is often predicted from evolutionary and protein structure theory. Under the 'domain grammar' hypothesis the function of a protein follows from the domains it encodes. Under the 'orthology conjecture', orthologs, related through species formation, are expected to be more functionally similar than paralogs, which are homologs in the same or different species descended from a gene duplication event. However, these assumptions have not thus far been systematically evaluated.

    To test the 'domain grammar' hypothesis, we built models for predicting function from the domain combinations present in a protein, and demonstrated that multi-domain combinations imply functions that the individual domains do not. We also developed a novel gene-tree based method for reconstructing the evolutionary histories of domain architectures, to search for cases of architectures that have arisen multiple times in parallel, and found this to be more common than previously reported.

    To test the 'orthology conjecture', we first benchmarked methods for homology inference under the obfuscating influence of low-complexity regions, in order to improve the InParanoid orthology inference algorithm. InParanoid was then used to test the relative conservation of functionally relevant properties between orthologs and paralogs at various evolutionary distances, including intron positions, domain architectures, and Gene Ontology functional annotations.

    We found an increased conservation of domain architectures in orthologs relative to paralogs, in support of the 'orthology conjecture' and the 'domain grammar' hypotheses acting in tandem. However, equivalent analysis of Gene Ontology functional conservation yielded spurious results, which may be an artifact of species-specific annotation biases in functional annotation databases. I discuss possible ways of circumventing this bias so the 'orthology conjecture' can be tested more conclusively.

  • 11.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Henricson, Anna
    Hollich, Volker
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain tree-based analysis of protein architecture evolution2008In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 25, no 2, p. 254-264Article in journal (Refereed)
    Abstract [en]

    Understanding the dynamics behind domain architecture evolution is of great importance to unravel the functions of proteins. Complex architectures have been created throughout evolution by rearrangement and duplication events. An interesting question is how many times a particular architecture has been created, a form of convergent evolution or domain architecture reinvention. Previous studies have approached this issue by comparing architectures found in different species. We wanted to achieve a finer-grained analysis by reconstructing protein architectures on complete domain trees. The prevalence of domain architecture reinvention in 96 genomes was investigated with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. Domain architectures were taken from Pfam. To ensure robustness, we applied the method to bootstrap trees and only considered results with strong statistical support. We detected multiple origins for 12.4% of the scored architectures. In a much smaller data set, the subset of completely domain-assigned proteins, the figure was 5.6%. These results indicate that domain architecture reinvention is a much more common phenomenon than previously thought. We also determined which domains are most frequent in multiply created architectures and assessed whether specific functions could be attributed to them. However, no strong functional bias was found in architectures with multiple origins.

  • 12.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Pekkari, Isabella
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain architecture conservation in orthologs2011In: BMC Bioinformatics, ISSN 1471-2105, E-ISSN 1471-2105, Vol. 12, p. 326-Article in journal (Refereed)
    Abstract [en]

    Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.

    Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.

    Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

  • 13.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Swedish e-Science Research Center .
    Evolution of Protein Domain Architectures2012In: Evolutionary Genomics: Statistical and Computational Methods, Vol 2 / [ed] Anisimova, M, Totowa, NJ: Humana Press, 2012, p. 187-216Chapter in book (Refereed)
    Abstract [en]

    This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions arc reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).

  • 14.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting protein function from domain content2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 15, p. 1681-1687Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.

    RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.

    AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar

  • 15.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    MGclus: network clustering employing shared neighbors2013In: Molecular BioSystems, ISSN 1742-206X, Vol. 9, no 7, p. 1670-1675Article in journal (Refereed)
    Abstract [en]

    Network analysis is an important tool for functional annotation of genes and proteins. A common approach to discern structure in a global network is to infer network clusters, or modules, and assume a functional coherence within each module, which may represent a complex or a pathway. It is however not trivial to define optimal modules. Although many methods have been proposed, it is unclear which methods perform best in general. It seems that most methods produce far from optimal results but in different ways. MGclus is a new algorithm designed to detect modules with a strongly interconnected neighborhood in large scale biological interaction networks. In our benchmarks we found MGclus to outperform other methods when applied to random graphs with varying degree of noise, and to perform equally or better when applied to biological protein interaction networks. MGclus is implemented in Java and utilizes the JGraphT graph library. It has an easy to use command-line interface and is available for download from http://sonnhammer.sbc.su.se/download/software/MGclus/.

  • 16.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Mank, Judith E.
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Network Analysis of Functional Genomics Data: Application to Avian Sex-Biased Gene Expression2012In: Scientific World Journal, ISSN 1537-744X, E-ISSN 1537-744X, p. 130491-Article in journal (Refereed)
    Abstract [en]

    Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.

  • 17.
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm, Bioinformatics Center, Science for Life Laboratory.
    Functional association networks for disease gene prediction2017Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Mapping of the human genome has been instrumental in understanding diseasescaused by changes in single genes. However, disease mechanisms involvingmultiple genes have proven to be much more elusive. Their complexityemerges from interactions of intracellular molecules and makes them immuneto the traditional reductionist approach. Only by modelling this complexinteraction pattern using networks is it possible to understand the emergentproperties that give rise to diseases.The overarching term used to describe both physical and indirect interactionsinvolved in the same functions is functional association. FunCoup is oneof the most comprehensive networks of functional association. It uses a naïveBayesian approach to integrate high-throughput experimental evidence of intracellularinteractions in humans and multiple model organisms. In the firstupdate, both the coverage and the quality of the interactions, were increasedand a feature for comparing interactions across species was added. The latestupdate involved a complete overhaul of all data sources, including a refinementof the training data and addition of new class and sources of interactionsas well as six new species.Disease-specific changes in genes can be identified using high-throughputgenome-wide studies of patients and healthy individuals. To understand theunderlying mechanisms that produce these changes, they can be mapped tocollections of genes with known functions, such as pathways. BinoX wasdeveloped to map altered genes to pathways using the topology of FunCoup.This approach combined with a new random model for comparison enables BinoXto outperform traditional gene-overlap-based methods and other networkbasedtechniques.Results from high-throughput experiments are challenged by noise and biases,resulting in many false positives. Statistical attempts to correct for thesechallenges have led to a reduction in coverage. Both limitations can be remediedusing prioritisation tools such as MaxLink, which ranks genes using guiltby association in the context of a functional association network. MaxLink’salgorithm was generalised to work with any disease phenotype and its statisticalfoundation was strengthened. MaxLink’s predictions were validatedexperimentally using FRET.The availability of prioritisation tools without an appropriate way to comparethem makes it difficult to select the correct tool for a problem domain.A benchmark to assess performance of prioritisation tools in terms of theirability to generalise to new data was developed. FunCoup was used for prioritisationwhile testing was done using cross-validation of terms derived fromGene Ontology. This resulted in a robust and unbiased benchmark for evaluationof current and future prioritisation tools. Surprisingly, previously superiortools based on global network structure were shown to be inferior to a localnetwork-based tool when performance was analysed on the most relevant partof the output, i.e. the top ranked genes.This thesis demonstrates how a network that models the intricate biologyof the cell can contribute with valuable insights for researchers that study diseaseswith complex genetic origins. The developed tools will help the researchcommunity to understand the underlying causes of such diseases and discovernew treatment targets. The robust way to benchmark such tools will help researchersto select the proper tool for their problem domain.

  • 18.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bernhem, Kristoffer
    Ait Blal, Hammou
    Lundberg, Emma
    Brismar, Hjalmar
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Experimental validation of predicted cancer genes using FRETManuscript (preprint) (Other academic)
  • 19.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden; Swedish eScience Research Center, Sweden.
    MaxLink: network-based prioritization of genes tightly linked to a disease seed set2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, p. 2689-2690Article in journal (Refereed)
    Abstract [en]

    A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.

  • 20.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A large-scale benchmark of gene prioritization methods2017In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 7, article id 46598Article in journal (Refereed)
    Abstract [en]

    In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology (GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

  • 21.
    Hennerdal, Aron
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Rapid membrane protein topology prediction2011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 9, p. 1322-1323Article in journal (Refereed)
    Abstract [en]

    State-of-the-art methods for topology of α-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ~6% worse than the best method that is utilizing multiple sequence alignments. AVAILABILITY AND IMPLEMENTATION: TOPCONS-single is available as a web server from http://single.topcons.net/ and is also included for local installation from the web site. In addition, consensus-based topology predictions for the entire international protein index (IPI) is available from the web server and will be updated at regular intervals.

  • 22.
    Hennerdal, Aron
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Tsirigos, Konstantinos
    A guideline to α-helical membrane protein topology predictionManuscript (preprint) (Other academic)
    Abstract [en]

    All living organisms have a “membrane proteome” that mainly consists of α-helical mem- brane proteins containing one or more TM-helices. Prediction methods have been extensively used to identify as well as to classify the topology of these proteins. For current state-of-the- art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased datasets. Here, we add four “genome-scale” datasets, including a recent large set of experimen- tally validated membrane proteins with glycosylation sites. This set is also used to examine whether the qualities of topology predictions hold and if any prediction methods perform con- sistently better than others. We find that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method which combines several of the other prediction methods. Further, we show that the accuracy is most likely lower in eukaryotes than for prokaryotic proteins as the agree- ment between the predictors is significantly lower there. Finally, we show that three related methods, Phobius, Phillius and PolyPhobius, that incorporate a specific signal peptide module are superior to all other methods at the task of distinguishing between membrane and non- membrane proteins.

  • 23.
    Henricson, Anna
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Orthology confers intron position conservation2010In: BMC Genomics, ISSN 1471-2164, E-ISSN 1471-2164, Vol. 11:412Article in journal (Refereed)
    Abstract [en]

    Background: With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence.

    Results: To this end, we developed a new score for IPC and applied it to ortholog groups between human and six other species. For comparison, we also gathered the closest non-orthologs, meaning sequences close in sequence space, yet falling just outside the ortholog cluster. We found that ortholog-ortholog gene pairs on average have a significantly higher degree of IPC compared to ortholog-closest non-ortholog pairs. Also pairs of inparalogs were found to have a higher IPC score than inparalog-closest non-inparalog pairs. We verified that these differences can not simply be attributed to the generally higher sequence identity of the ortholog-ortholog and the inparalog-inparalog pairs. Furthermore, we analyzed the agreement between IPC score and the ortholog score assigned by the InParanoid algorithm, and found that it was consistently high for all species comparisons. In a minority of cases, the IPC and InParanoid score ranked inparalogs differently. These represent cases where sequence and intron position divergence are discordant. We further analyzed the discordant clusters to identify any possible preference for protein functions by looking for enriched GO terms and Pfam protein domains. They were enriched for functions important for multicellularity, which implies a connection between shifts in intronic structure and the origin of multicellularity.

    Conclusions: We conclude that orthologous genes tend to have more conserved intron positions compared to non-orthologous genes. As a consequence, our IPC score is useful as an additional discriminating factor when assigning orthology.

  • 24.
    Herman, Pawel Andrzej
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Lundqvist, Mikael
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Nested theta to gamma oscillations and precise spatiotemporal firing during memory retrieval in a simulated attractor network2013In: Brain Research, ISSN 0006-8993, E-ISSN 1872-6240, Vol. 1536, no S1, p. 68-87Article in journal (Refereed)
    Abstract [en]

    Nested oscillations, where the phase of the underlying slow rhythm modulates the power of faster oscillations, have recently attracted considerable research attention as the increased phase-coupling of cross-frequency oscillations has been shown to relate to memory processes. Here we investigate the hypothesis that reactivations of memory patterns, induced by either external stimuli or internal dynamics, are manifested as distributed cell assemblies oscillating at gamma-like frequencies with life-times on a theta scale. For this purpose, we study the spatiotemporal oscillatory dynamics of a previously developed meso-scale attractor network model as a correlate of its memory function. The focus is on a hierarchical nested organization of neural oscillations in delta/theta (2–5 Hz) and gamma frequency bands (25–35 Hz), and in some conditions even in lower alpha band (8–12 Hz), which emerge in the synthesized field potentials during attractor memory retrieval. We also examine spiking behavior of the network in close relation to oscillations. Despite highly irregular firing during memory retrieval and random connectivity within each cell assembly, we observe precise spatiotemporal firing patterns that repeat across memory activations at a rate higher than expected from random firing. In contrast to earlier studies aimed at modeling neural oscillations, our attractor memory network allows us to elaborate on the functional context of emerging rhythms and discuss their relevance. We provide support for the hypothesis that the dynamics of coherent delta/theta oscillations constitute an important aspect of the formation and replay of neuronal assemblies.

  • 25. Ke, Rongqin
    et al.
    Mignardi, Marco
    Hauling, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nilsson, Mats
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Fourth Generation of Next-Generation Sequencing Technologies: Promise and Consequences2016In: Human Mutation, ISSN 1059-7794, E-ISSN 1098-1004, Vol. 37, no 12, p. 1363-1367Article, review/survey (Refereed)
    Abstract [en]

    In this review, we discuss the emergence of the fourth-generation sequencing technologies that preserve the spatial coordinates of RNA and DNA sequences with up to subcellular resolution, thus enabling back mapping of sequencing reads to the original histological context. This information is used, for example, in two current large-scale projects that aim to unravel the function of the brain. Also in cancer research, fourth-generation sequencing has the potential to revolutionize the field. Cancer Research UK has named Mapping the molecular and cellular tumor microenvironment in order to define new targets for therapy and prognosis one of the grand challenges in tumor biology. We discuss the advantages of sequencing nucleic acids directly in fixed cells over traditional next-generation sequencing (NGS) methods, the limitations and challenges that these new methods have to face to become broadly applicable, and the impact that the information generated by the combination of in situ sequencing and NGS methods will have in research and diagnostics.

  • 26. Kenah, Eben
    et al.
    Britton, Tom
    Stockholm University, Faculty of Science, Department of Mathematics.
    Halloran, M. Elizabeth
    Longini, Ira M.
    Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees2016In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 12, no 4Article in journal (Refereed)
    Abstract [en]

    Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology.

  • 27. Kurrikoff, Kaido
    et al.
    Veiman, Kadi-Liis
    Künnapuu, Kadri
    Peets, Elin Madli
    Lehto, Tõnis
    Stockholm University, Faculty of Science, Department of Neurochemistry.
    Pärnaste, Ly
    Arukuusk, Piret
    Langel, Ülo
    Stockholm University, Faculty of Science, Department of Neurochemistry. University of Tartu, Estonia.
    Effective in vivo gene delivery with reduced toxicity, achieved by charge and fatty acid -modified cell penetrating peptide2017In: Scientific Reports, ISSN 2045-2322, E-ISSN 2045-2322, Vol. 7, article id 17056Article in journal (Refereed)
    Abstract [en]

    Non-viral gene delivery systems have gained considerable attention as a promising alternative to viral delivery to treat diseases associated with aberrant gene expression. However, regardless of extensive research, only a little is known about the parameters that underline in vivo use of the nanoparticle-based delivery vectors. The modest efficacy and low safety of non-viral delivery are the two central issues that need to be addressed. We have previously characterized an efficient cell penetrating peptide, PF14, for in vivo applications. In the current work, we first develop an optimized formulation of PF14/pDNA nanocomplexes, which allows removal of the side-effects without compromising the bioefficacy in vivo. Secondly, based on the physicochemical complex formation studies and biological efficacy assessments, we develop a series of PF14 modifications with altered charge and fatty acid content. We show that with an optimal combination of overall charge and hydrophobicity in the peptide backbone, in vivo gene delivery can be augmented. Further combined with the safe formulation, systemic gene delivery lacking any side effects can be achieved.

  • 28.
    Larsson, Per
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Prediction, modeling, and refinement of protein structure2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Accurate predictions of protein structure are important for understanding many processes in cells. The interactions that govern protein folding and structure are complex, and still far from completely understood. However, progress is being made in many areas. Here, efforts to improve the overall quality of protein structure models are described. From a pure evolutionary perspective, in which proteins are viewed in the light of gradually accumulated mutations on the sequence level, it is shown how information from multiple sources helps to create more accurate models. A very simple but surprisingly accurate method for assigning confidence measures for protein structures is also tested. In contrast to models based on evolution, physics based methods view protein structures as the result of physical interactions between atoms. Newly implemented methods are described that both increase the time-scales accessible for molecular dynamics simulations almost 10-fold, and that to some extent might be able to refine protein structures. Finally, I compare the efficiency and properties of different techniques for protein structure refinement.

  • 29.
    Larsson, Per
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Skwark, Marcin J.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Wallner, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Assessment of global and local model quality in CASP8 using Pcons and ProQ2009In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 77, no 9, p. 167-172Article in journal (Refereed)
    Abstract [en]

    Model Quality Assessment Programs (MQAPs) are programs developed to rank protein models. These methods can be trained to predict the overall global quality of a model or what local regions in a model that are likely to be incorrect. In CASP8, we participated with two predictors that predict both global and local quality using either consensus information, Pcons, or purely structural information, ProQ. Consistently with results in previous CASPs, the best performance in CASP8 was obtained using the Pcons method. Furthermore, the results show that the modification introduced into Pcons for CASP8 improved the predictions against GDT_TS and now a correlation coefficient above 0.9 is achieved, whereas the correlation for ProQ is about 0.7. The correlation is better for the easier than for the harder targets, but it is not below 0.5 for a single target and below 0.7 only for three targets. The correlation coefficient for the best local quality MQAP is 0.68 showing that there is still clear room for improvement within this area. We also detect that Pcons still is not always able to identify the best model. However, we show that using a linear combination of Pcons and ProQ it is possible to select models that are better than the models from the best single server. In particular, the average quality over the hard targets increases by about 6% compared with using Pcons alone.

  • 30.
    Larsson, Per
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Skwark, Marcin J.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Wallner, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Improved predictions by Pcons.net using multiple templates2011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 3, p. 426-427Article in journal (Refereed)
    Abstract [en]

    Multiple templates can often be used to build more accurate homology models than models built from a single template. Here we introduce PconsM, an automated protocol that uses multiple templates to build protein models. PconsM has been among the top-performing methods in the recent CASP experiments and consistently perform better than the single template models used in Pcons. net. In particular for the easier targets with many alternative templates with a high degree of sequence identity, quality is readily improved with a few percentages over the highest ranked model built on a single template. PconsM is available as an additional pipeline within the Pcons. net protein structure prediction server.

  • 31.
    Light, Sara
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sagit, Rauan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Ithychanda, Sujay S.
    Qin, Jun
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    The evolution of filamin - A protein domain repeat perspective2012In: Journal of Structural Biology, ISSN 1047-8477, E-ISSN 1095-8657, Vol. 179, no 3, p. 289-298Article in journal (Refereed)
    Abstract [en]

    Particularly in higher eukaryotes, some protein domains are found in tandem repeats, performing broad functions often related to cellular organization. For instance, the eukaryotic protein filamin interacts with many proteins and is crucial for the cytoskeleton. The functional properties of long repeat domains are governed by the specific properties of each individual domain as well as by the repeat copy number. To provide better understanding of the evolutionary and functional history of repeating domains, we investigated the mode of evolution of the filamin domain in some detail. Among the domains that are common in long repeat proteins, sushi and spectrin domains evolve primarily through cassette tandem duplications while scavenger and immunoglobulin repeats appear to evolve through clustered tandem duplications. Additionally, immunoglobulin and filamin repeats exhibit a unique pattern where every other domain shows high sequence similarity. This pattern may be the result of tandem duplications, serve to avert aggregation between adjacent domains or it is the result of functional constraints. In filamin, our studies confirm the presence of interspersed integrin binding domains in vertebrates, while invertebrates exhibit more varied patterns, including more clustered integrin binding domains. The most notable case is leech filamin, which contains a 20 repeat expansion and exhibits unique dimerization topology. Clearly, invertebrate filamins are varied and contain examples of similar adjacent integrin-binding domains. Given that invertebrate integrin shows more similarity to the weaker filamin binder, integrin beta 3, it is possible that the distance between integrin-binding domains is not as crucial for invertebrate filamins as for vertebrates.

  • 32. Low, Yen S.
    et al.
    Caster, Ola
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Uppsala Monitoring Center, Sweden.
    Bergvall, Tomas
    Fourches, Denis
    Zang, Xiaoling
    Norén, G. Niklas
    Stockholm University, Faculty of Science, Department of Mathematics. Uppsala Monitoring Center, Sweden.
    Rusyn, Ivan
    Edwards, Ralph
    Tropsha, Alexander
    Cheminformatics-aided pharmacovigilance: application to Stevens-Johnson Syndrome2016In: JAMIA Journal of the American Medical Informatics Association, ISSN 1067-5027, E-ISSN 1527-974X, Vol. 23, no 5, p. 968-978Article in journal (Refereed)
    Abstract [en]

    Objective Quantitative Structure-Activity Relationship (QSAR) models can predict adverse drug reactions (ADRs), and thus provide early warnings of potential hazards. Timely identification of potential safety concerns could protect patients and aid early diagnosis of ADRs among the exposed. Our objective was to determine whether global spontaneous reporting patterns might allow chemical substructures associated with Stevens-Johnson Syndrome (SJS) to be identified and utilized for ADR prediction by QSAR models. Materials and Methods Using a reference set of 364 drugs having positive or negative reporting correlations with SJS in the VigiBase global repository of individual case safety reports (Uppsala Monitoring Center, Uppsala, Sweden), chemical descriptors were computed from drug molecular structures. Random Forest and Support Vector Machines methods were used to develop QSAR models, which were validated by external 5-fold cross validation. Models were employed for virtual screening of DrugBank to predict SJS actives and inactives, which were corroborated using knowledge bases like VigiBase, ChemoText, and MicroMedex (Truven Health Analytics Inc, Ann Arbor, Michigan). Results We developed QSAR models that could accurately predict if drugs were associated with SJS (area under the curve of 75%-81%). Our 10 most active and inactive predictions were substantiated by SJS reports (or lack thereof) in the literature. Discussion Interpretation of QSAR models in terms of significant chemical descriptors suggested novel SJS structural alerts. Conclusions We have demonstrated that QSAR models can accurately identify SJS active and inactive drugs. Requiring chemical structures only, QSAR models provide effective computational means to flag potentially harmful drugs for subsequent targeted surveillance and pharmacoepidemiologic investigations.

  • 33.
    McCormack, Theodore
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Statistical Assessment of Crosstalk Enrichment between Gene Groups in Biological Networks2013In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 8, no 1, p. e54945-Article in journal (Refereed)
    Abstract [en]

    Motivation: Analyzing groups of functionally coupled genes or proteins in the context of global interaction networks has become an important aspect of bioinformatic investigations. Assessing the statistical significance of crosstalk enrichment between or within groups of genes can be a valuable tool for functional annotation of experimental gene sets. Results: Here we present CrossTalkZ, a statistical method and software to assess the significance of crosstalk enrichment between pairs of gene or protein groups in large biological networks. We demonstrate that the standard z-score is generally an appropriate and unbiased statistic. We further evaluate the ability of four different methods to reliably recover crosstalk within known biological pathways. We conclude that the methods preserving the second-order topological network properties perform best. Finally, we show how CrossTalkZ can be used to annotate experimental gene sets using known pathway annotations and that its performance at this task is superior to gene enrichment analysis (GEA). Availability and Implementation: CrossTalkZ (available at http://sonnhammer.sbc.su.se/download/software/CrossTalkZ/) is implemented in C++, easy to use, fast, accepts various input file formats, and produces a number of statistics. These include z-score, p-value, false discovery rate, and a test of normality for the null distributions.

  • 34.
    Menéndez Hurtado, David
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Structured Learning for Structural Bioinformatics: Applications of Deep Learning to Protein Structure Prediction2019Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Proteins are the basic molecular machines of the cell, performing a broad range of tasks, from structural support to catalysisof chemical reactions. Their function is determined by their 3D structure, which in turn is dictated by the order of their components, the amino acids.

    This thesis is dedicated to applications of machine learning to the problems of contact prediction, ab-initio, and model quality assessment. In particular, my research has been focused on developing methods that are both effective, and easy to use.

    In the first paper, we improved the already state-of-the-art model quality assessment (MQA) program ProQ3 replacing the underlying machine learning algorithm from svm to Deep Learning, baptised ProQ3D. The correlation between predicted and true scores was improved from 0.85 to 0.90, using the same training data and features.

    The second paper joined several programs into a single pipeline for ab-initio structure prediction: contact prediction,folding, and model selection. We attempted to predict the structures of all 6379 PFAM families with unknown structure, ofwhich 558 we believe to be accurate. Of these, 415 had not been reported before.

    The third paper uses advances in machine learning to build a contact predictor, PconsC4, that is fast and easy to deployin large-scale studies, since it requires a single Multiple Sequence Alignment (MSA), and no external dependencies. The predictions are state-of-the-art, yielding a 12% improvement in precision over PconsC3, and 244 times faster.

    With ProQ4, in the fourth paper, we introduce a novel way of training deep networks for MQA in a way that minimises the bias of the training data, and emphasises model ranking, and demonstrate its viability with a minimal description ofthe protein. The ranking correlation was improved with respect to ProQ3D from 0.82 to 0.90.

    Lastly, in the fifth paper, weshow the results of ProQ3D and ProQ4 in a completely blind test: CASP13.

  • 35.
    Menéndez Hurtado, David
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Uziela, Karolis
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A novel training procedure to train deep networks in the assessment of the quality of protein modelsManuscript (preprint) (Other academic)
    Abstract [en]

    Motivation: Proteins fold into complex structures that are crucial for their biological functions. Experimental determination of protein structures iscostly and therefore limited to a small fraction of all known proteins. Hence,different computational structure prediction methods are necessary for themodelling of the vast majority of all proteins. In most structure predictionpipelines, the last step is to select the best available model and to estimateits accuracy. This model quality estimation problem has been growing inimportance during the last decade, and progress is believed to be importantfor large scale modelling of proteins. Current machine learning models trained to estimate the protein modelquality suffer from biases in the training set: multiple models of only a fewtargets, generated by a few methods.

    Results: We propose a new methodology to train deep networks that leveragesthe structure of the problem and takes advantage of some of this redundan-cies. We demonstrate its viability by reaching results comparable with anotherstate-of-the-art method, ProQ3D, trained and evaluated on the same datasets,but employing only a small subset of the input features.The proposed training strategy is applicable to other input features anddatasets, and thus can be applied to other programs.

    Availability: The code is freely available for download at: github.com/ElofssonLab/ProQ4 and runs with minimal requirements: requires only one multiplesequence alignment and a collection of models and depends only on Python3, hdf5, a deep learning framework compatible with Keras, and dssp.Contact: arne@bioinfo.se

  • 36.
    Messina, David
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Lysholm, Fredrik
    Department of Cell and Molecular Biology, Karolinska Institutet.
    Allander, Tobias
    Department of Microbiology, Tumor- and Cell Biology, Karolinska Institutet.
    Andersson, Björn
    Department of Cell and Molecular Biology, Karolinska Institutet.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Discovery of novel protein families in metagenomic samplesManuscript (preprint) (Other academic)
    Abstract [en]

    Despite the steady rise in gene sequence information, there is a persistent, significant fraction of genes which do not match any previously known sequence. These genes are called ORFans, and metagenomic samples, where DNA is extracted from a mixed population of unknown and often uncultivable species, are a rich source of ORFans. Viral infections cause significant morbidity and mortality, and identifying ORFan viral gene families from human metagenomic samples represents a route to understanding molecular processes that affect human health. Few methods exist for metagenomic gene-finding, and most of them rely on sequence similarity, which cannot be used to detect ORFans. Furthermore, nonsimilarity-based methods are hard to apply to the complex mixture of short, higherror-rate sequence fragments which are typical of metagenomic projects. Here we present an approach to detect ORFan protein families in short-read data, and apply it to 937 Mbp (megabase pairs) of sequence from 17 virus-enriched libraries made from human nasopharyngeal aspirates, serum, feces, and cerebrospinal fluid samples. After isolating approximately 450 putative ORFan families from clusters of sequence contigs, we applied RNAcode, a gene finder developed for use on high-quality genome sequences, and calibrated it for errorprone short sequence reads. Additional predictive measures such as sequence complexity and length were then used to rank and filter candidates into a high-quality set of 32 putative novel gene families, only two of which show significant similarity to known genes.

  • 37.
    Messina, David N.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Biological data exchange and the discovery of new protein families in metagenomic samples2012Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The rise in sequence data has brought both challenges to the way we exchange biological information and opportunities to discover new protein families, primarily through the investigation of uncultured metagenomic samples.The Distributed Annotation System, or DAS, provided a means for exchanging protein sequence data, but there were no open source, stand-alone DAS clients optimized for integrating and viewing these data. To address this need, we developed DASher. Complementary to visualizing DAS data with DASher, we also created and made available ten servers to offer real-time protein feature predictions via DAS. While DAS works well for genomic data, there was no such framework for exchanging orthology data in a consistent way. Consequently, we developed the first standards for orthology data exchange, SeqXML and OrthoXML. 64 reference proteomes are now available in SeqXML, and 14 orthology providers have agreed to offer their predictions in OrthoXML. Besides creating a uniform representation of common data types, these standards enable direct comparison and assessment of competing methods for the first time.A substantial percentage of newly sequenced genes are ORFans, which have no match to previously known sequences. Metagenomics samples uncover sequences from uncultivable and therefore previously unseen species, and ORFans constitute much of the metagenomics data that are completely uncharacterized. ORFans are by definition impervious to standard similarity-based methods, and the few existing metagenomics gene-finding methods performed poorly on short, error-prone next-generation sequence data. Therefore, we designed a new approach to predict protein-coding gene families from metagenomic data and applied it to 17 virally-enriched metagenomes derived from human patients. Of the 456 putative ORFan families we found in the nearly 1 billion nucleotides sequenced from these libraries, we identified 32 putative novel protein families with strong support.

  • 38.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Menéndez Hurtado, David
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    PconsC4: fast, accurate and hassle-free contact predictions2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 15, p. 2677-2679Article in journal (Refereed)
    Abstract [en]

    Motivation

    Residue contact prediction was revolutionized recently by the introduction of direct coupling analysis (DCA). Further improvements, in particular for small families, have been obtained by the combination of DCA and deep learning methods. However, existing deep learning contact prediction methods often rely on a number of external programs and are therefore computationally expensive.

    Results

    Here, we introduce a novel contact predictor, PconsC4, which performs on par with state of the art methods. PconsC4 is heavily optimized, does not use any external programs and therefore is significantly faster and easier to use than other methods.

    Availability and implementation

    PconsC4 is freely available under the GPL license from https://github.com/ElofssonLab/PconsC4. Installation is easy using the pip command and works on any system with Python 3.5 or later and a GCC compiler. It does not require a GPU nor special hardware.

    Supplementary information

    Supplementary data are available at Bioinformatics online.

  • 39.
    Morgan, Daniel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. SciLifeLab.
    Towards Reliable Gene Regulatory Network Inference2019Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Phenotypic traits are now known to stem from the interplay between genetic variables across many if not every level of biology. The field of gene regulatory network (GRN) inference is concerned with understanding the regulatory interactions between genes in a cell, in order to build a model that captures the behaviour of the system. Perturbation biology, whereby genes or RNAs are targeted and their activity altered, is of great value for the GRN field. By first systematically perturbing the system and then reading the system's reaction as a whole, we can feed this data into various methods to reverse engineer the key agents of change.

    The initial study sets the groundwork for the rest, and deals with finding common ground among the sundry methods in order to compare and rank performance in an unbiased setting. The GeneSPIDER (GS) MATLAB package is an inference benchmarking platform whereby methods can be added via a wrapper for testing in competition with one another. Synthetic datasets and networks spanning a wide range of conditions can be created for this purpose. The evaluation of methods across various conditions in the benchmark therein demonstrates which properties influence the accuracy of which methods, and thus which are more suitable for use under given characterized condition.

    The second study introduces a novel framework NestBoot for increasing inference accuracy within the GS environment by independent, nested bootstraps, \ie repeated inference trials. Under low to medium noise levels, this allows support to be gathered for links occurring most often while spurious links are discarded through comparison to an estimated null distribution of shuffled-links. While noise continues to plague every method, nested bootstrapping in this way is shown to increase the accuracy of several different methods.

    The third study applies NestBoot on real data to infer a reliable GRN from an small interfering RNA (siRNA) perturbation dataset covering 40 genes known or suspected to have a role in human cancers. Methods were developed to benchmark the accuracy of an inferred GRN in the absence of a true known GRN, by assessing how well it fits the data compared to a null model of shuffled topologies. A network of high confidence was recovered containing many regulatory links known in the literature, as well as a slew of novel links.

    The fourth study seeks to infer reliable networks on large scale, utilizing the high dimensional biological datasets of the LINCS L1000 project.  This dataset has too much noise for accurate GRN inference as a whole, hence we developed a method to select a  subset that is sufficiently informative to accurately infer GRNs. This is a first step in the direction of identifying probable submodules within a greater genome-scale GRN yet to be uncovered.

  • 40.
    Morgan, Daniel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Studham, Matthew
    Tjärnberg, Andreas
    Weishaupt, Holger
    Swartling, Fredrik
    Nordling, Torbjörn
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Perturbation-based gene regulatory network inference to unravel oncogenic mechanismsManuscript (preprint) (Other academic)
    Abstract [en]

    Motivation: Cancer is known to stem from multiple, independent mutations, the effects of which aggregate to drive the cell into a cancerous state. To understand the complex interplay between affected genes, their gene regulatory network (GRN) needs to be uncovered, to revealing detailed insights of regulatory mechanisms. We therefore decided to infer a reliable GRN from perturbation responses of 40 genes known or suspected to have a role in human cancers yet whose regulatory interactions are poorly known.

    Results: siRNA knockdown experiments of each gene were done in a human squamous carcinoma cell line, after which the transcriptomic response was measured. From these data GRNs were inferred using several methods, and the false discovery rate was controlled by the NestBoot framework. The best GRN was shown to be significantly more predictive than the null model, both in crossvalidated benchmarks and for an independent dataset of the same genes but subjected to double perturbations. It agrees with many known links in addition to predicting a large number of novel interactions, a subset of which were experimentally validated. The inferred GRN captures regulatory interactions central to cancer-relevant processes and thus provides mechanistic insights that are useful for future cancer research.

  • 41.
    Morgan, Daniel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Nordling, Torbjörn E. M.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A generalized framework for controlling FDR in gene regulatory network inference2019In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 35, no 6, p. 1026-1032Article in journal (Refereed)
    Abstract [en]

    Motivation: Inference of gene regulatory networks (GRNs) from perturbation data can give detailed mechanistic insights of a biological system. Many inference methods exist, but the resulting GRN is generally sensitive to the choice of method-specific parameters. Even though the inferred GRN is optimal given the parameters, many links may be wrong or missing if the data is not informative. To make GRN inference reliable, a method is needed to estimate the support of each predicted link as the method parameters are varied.

    Results: To achieve this we have developed a method called nested bootstrapping, which applies a bootstrapping protocol to GRN inference, and by repeated bootstrap runs assesses the stability of the estimated support values. To translate bootstrap support values to false discovery rates we run the same pipeline with shuffled data as input. This provides a general method to control the false discovery rate of GRN inference that can be applied to any setting of inference parameters, noise level, or data properties. We evaluated nested bootstrapping on a simulated dataset spanning a range of such properties, using the LASSO, Least Squares, RNI, GENIE3 and CLR inference methods. An improved inference accuracy was observed in almost all situations. Nested bootstrapping was incorporated into the GeneSPIDER package, which was also used for generating the simulated networks and data, as well as running and analyzing the inferences.

  • 42.
    Moruz, Luminita
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Chromatographic retention time prediction and its applications in mass spectrometry-based proteomics2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Mass spectrometry-based methods are among the most commonly used techniques to characterize proteins in biological samples. With rapid technological developments allowing increasing throughput, thousands of proteins can now be monitored in a matter of hours. However, these advances brought a whole new set of analytical challenges. At the moment, it is no longer possible to rely on human experts to process the data. Instead, accurate computational tools are required.

    In line with these observations, my research work has involved development of computational methods to facilitate the analysis of mass spectrometry-based experiments. In particular, the projects included in this thesis revolve around the chromatography step of such experiments, where peptides are separated according to their hydrophobicity.

    The first part of the thesis describes an algorithm to predict retention time from peptide sequences. The method provides more accurate predictions compared to previous approaches, while being easily transferable to other chromatography setups. In addition, it gives equally good predictions for peptides carrying arbitrary posttranslational modifications as for unmodified peptides.

    The second part of the thesis includes two applications of retention time predictions in the context of mass spectrometry-based proteomics experiments. First, we show how theoretical calculations of masses and retention times can be used to infer proteins in shotgun proteomics experiments. Secondly, we illustrate the use of retention time predictions to calculate optimized gradient functions for reversed-phase liquid chromatography.

  • 43.
    Norinder, Ulf
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institutet, Sweden.
    Svensson, Fredrik
    Multitask Modeling with Confidence Using Matrix Factorization and Conformal Prediction2019In: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 59, no 4, p. 1598-1604Article in journal (Refereed)
    Abstract [en]

    Multitask prediction of bioactivities is often faced with challenges relating to the sparsity of data and imbalance between different labels. We propose class conditional (Mondrian) conformal predictors using underlying Macau models as a novel approach for large scale bioactivity prediction. This approach handles both high degrees of missing data and label imbalances while still producing high quality predictive models. When applied to ten assay end points from PubChem, the models generated valid models with an efficiency of 74.0-80.1% at the 80% confidence level with similar performance both for the minority and majority class. Also when deleting progressively larger portions of the available data (0-80%) the performance of the models remained robust with only minor deterioration (reduction in efficiency between 5 and 10%). Compared to using Macau without conformal prediction the method presented here significantly improves the performance on imbalanced data sets.

  • 44.
    Ogris, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Global functional association network inference and crosstalk analysis for pathway annotation2017Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Cell functions are steered by complex interactions of gene products, like forming a temporary or stable complex, altering gene expression or catalyzing a reaction. Mapping these interactions is the key in understanding biological processes and therefore is the focus of numerous experiments and studies. Small-scale experiments deliver high quality data but lack coverage whereas high-throughput techniques cover thousands of interactions but can be error-prone. Unfortunately all of these approaches can only focus on one type of interaction at the time. This makes experimental mapping of the genome-wide network a cost and time intensive procedure. However, to overcome these problems, different computational approaches have been suggested that integrate multiple data sets and/or different evidence types. This widens the stringent definition of an interaction and introduces a more general term - functional association. 

    FunCoup is a database for genome-wide functional association networks of Homo sapiens and 16 model organisms. FunCoup distinguishes between five different functional associations: co-membership in a protein complex, physical interaction, participation in the same signaling cascade, participation in the same metabolic process and for prokaryotic species, co-occurrence in the same operon. For each class, FunCoup applies naive Bayesian integration of ten different evidence types of data, to predict novel interactions. It further uses orthologs to transfer interaction evidence between species. This considerably increases coverage, and allows inference of comprehensive networks even for not well studied organisms. 

    BinoX is a novel method for pathway analysis and determining the relation between gene sets, using functional association networks. Traditionally, pathway annotation has been done using gene overlap only, but these methods only get a small part of the whole picture. Placing the gene sets in context of a network provides additional evidence for pathway analysis, revealing a global picture based on the whole genome.

    PathwAX is a web server based on the BinoX algorithm. A user can input a gene set and get online network crosstalk based pathway annotation. PathwAX uses the FunCoup networks and 280 pre-defined pathways. Most runs take just a few seconds and the results are summarized in an interactive chart the user can manipulate to gain further insights of the gene set's pathway associations.

  • 45.
    Pascarelli, Stefano
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Tsirigos, Konstantinos
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Peters, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    PRODRES: Fast protein searches using a protein domain-reduced databaseManuscript (preprint) (Other academic)
    Abstract [en]

    Motivation: Detection of homologous sequences is a the basis formany bioinformatics applications. Position-Specific Scoring Matrices(PSSMs) or Hidden Markov Models (HMMs) are often created fromthe detected homologous sequences. These are then widely usedin many bioinformatics software in order to incorporate evolutionaryinformation in the prediction process. However, due to the increasein the size of reference databases, there is a continuous decrease inspeed of homology detection even with faster computers.Results: By using PRODRES, we save on average X percent ofthe search time. This pipeline has been exploited in our widely usedtopology prediction software, TOPCONS. In total, more than 5 millionPSSMs have been generated, with an average running time of about1 minute. This corresponds to an approximate 10 times speed-up ofthe whole process.Availability and implementation: A standalone version ofPRODRES can be found in the Github repository https://github.com/-ElofssonLab/PRODRES, while a web-server implementing themethod is available for academic users at http://PRODRES.bioinfo.se/

  • 46.
    Sagit, Rauan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Variation in length of proteins by repeats and disorder regions2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Protein-coding genes evolve together with their genome and acquire changes, some of which affect the length of their protein products. This explains why equivalent proteins from different species can exhibit length differences. Variation in length of proteins during evolution arguably presents a large number of possibilities for improvement and innovation of protein structure and function. In order to contribute to an increased understanding of this process, we have studied variation caused by tandem domain duplications and insertions or deletions of intrinsically disordered residues.

    The study of two proteins, Nebulin and Filamin, together with a broader study of long repeat proteins (>10 domain repeats), began by confirming that tandem domains evolve by internal duplications. Next, we show that vertebrate Nebulins evolved by duplications of a seven-domain unit, yet the most recent duplications utilized different gene parts as duplication units. However, Filamin exhibits a checkered duplication pattern, indicating that duplications were followed by similarity erosions that were hindered at particular domains due to the presence of equivalent binding motifs. For long repeat proteins, we found that human segmental duplications are over-represented in long repeat genes. Additionally, domains that have formed long repeats achieved this primarily by duplications of two or more domains at a time.

    The study of homologous protein pairs from the well-characterized eukaryotes nematode, fruit fly and several fungi, demonstrated a link between variation in length and variation in the number of intrinsically disordered residues. Next, insertions and deletions (indels) estimated from HMM-HMM pairwise alignments showed that disordered residues are clearly more frequent among indel than non-indel residues. Additionally, a study of raw length differences showed that more than half of the variation in fungi proteins is composed of disordered residues. Finally, a model of indels and their immediate surroundings suggested that disordered indels occur in already disordered regions rather than in ordered regions.

  • 47.
    Sagit, Rauan
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Light, Sara
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Ekman, Diana
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Protein expansion is primarily due to indels in intrinsically disordered regionsManuscript (preprint) (Other academic)
    Abstract [en]

    Proteins evolve not only through point mutations but also by insertion and deletion events, which affect the length of the protein. It is well known that such indel events most frequently occur in surface exposed loops. However, detailed analysis of indel events in distantly related proteins is hampered by the difficulty involved in correctly aligning such sequences. Here, we circumvent this problem by analyzing homologous proteins based on length variation rather than pairwise alignments. We find a surprisingly strong relationship between difference in length and difference in the number of intrinsically disordered residues, where more than half of the length variation can be explained by changes in the number of intrinsically disordered residues. A more detailed analysis reveals that indel events do not induce disorder but rather that already disordered regions accrue indels, suggesting that there is a significantly lowered selective pressure for indels to occur within intrinsically disordered regions.

  • 48. Sahlin, Kristoffer
    et al.
    Street, Nathaniel
    Lundeberg, Joakim
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Improved gap size estimation for scaffolding algorithms2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 17, p. 2215-2222Article in journal (Refereed)
    Abstract [en]

    Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.

    Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.

  • 49.
    Salgado, Marco
    Stockholm University, Faculty of Science, Department of Ecology, Environment and Plant Sciences. Stockholm University.
    Comparative Analysis of the Nodule Transcriptomes of Ceanothus thyrsiflorus (Rhamnaceae, Rosales) and Datisca glomerata (Datiscaceae, Cucurbitales)2018In: Frontiers in Plant Sciences, Vol. 9, no 1629Article in journal (Refereed)
    Abstract [en]

     Two types of nitrogen-fixing root nodule symbioses are known, rhizobial and actinorhizal

    symbioses. The latter involve plants of three orders, Fagales, Rosales, and Cucurbitales.

    To understand the diversity of plant symbiotic adaptation, we compared the nodule

    transcriptomes of Datisca glomerata  (Datiscaceae, Cucurbitales) and Ceanothus

    thyrsiflorus  (Rhamnaceae, Rosales); both species are nodulated by members of the

    uncultured Frankia  clade, cluster II. The analysis focused on various features. In

    both species, the expression of orthologs of legume Nod factor receptor genes

    was elevated in nodules compared to roots. Since arginine has been postulated as

    export form of fixed nitrogen from symbiotic Frankia  in nodules of D. glomerata,  the

    question was whether the nitrogen metabolism was similar in nodules of C. thyrsiflorus .

    Analysis of the expression levels of key genes encoding enzymes involved in arginine

    metabolism revealed up-regulation of arginine catabolism, but no up-regulation of

    arginine biosynthesis, in nodules compared to roots of D. glomerata,  while arginine

    degradation was not upregulated in nodules of C. thyrsiflorus . This new information

    corroborated an arginine-based metabolic exchange between host and microsymbiont

    for D. glomerata,  but not for C. thyrsiflorus.  Oxygen protection systems for nitrogenase

    differ dramatically between both species. Analysis of the antioxidant system suggested

    that the system in the nodules of D. glomerata  leads to greater oxidative stress than

    the one in the nodules of C. thyrsiflorus,  while no differences were found for the

    defense against nitrosative stress. However, induction of nitrite reductase in nodules of

    C. thyrsiflorus  indicated that here, nitrite produced from nitric oxide had to be detoxified.

    Additional shared features were identified: genes encoding enzymes involved in thiamine

    biosynthesis were found to be upregulated in the nodules of both species. Orthologous

    nodule-specific subtilisin-like proteases that have been linked to the infection process

    in actinorhizal Fagales, were also upregulated in the nodules of D. glomerata  and 

     C. thyrsiflorus. Nodule-specific defensin genes known from actinorhizal Fagales and

    Cucurbitales, were also found in  C. thyrsiflorus. In summary, the results underline the

    variability of nodule metabolism in different groups of symbiotic plants while pointing at

    conserved features involved in the infection process.

  • 50. Sand, M.
    et al.
    Bechara, F. G.
    Gambichler, T.
    Sand, D.
    Friedländer, Marc R.
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bromba, M.
    Schnabel, R.
    Hessam, S.
    Next-generation sequencing of the basal cell carcinoma miRNome and a description of novel microRNA candidates under neoadjuvant vismodegib therapy: an integrative molecular and surgical case study2016In: Annals of Oncology, ISSN 0923-7534, E-ISSN 1569-8041, Vol. 27, no 2, p. 332-338Article in journal (Refereed)
    Abstract [en]

    Background: MicroRNAs (miRNAs) have been identified as key players in posttranscriptional gene regulation and have a significant impact on basal cell carcinoma (BCC) development. The Sonic hedgehog pathway inhibitor vismodegib has been approved for oral therapy of metastatic or advanced BCC. Here, a high-throughput miRNA sequencing analysis was carried out to identify differentially expressed miRNAs and possible novel miRNA candidates in vismodegib-treated BCC tissue. Additionally, we described our surgical experience with neoadjuvant oral vismodegib therapy. Patients and methods: A punch biopsy (4 mm) from a patient with an extensive cranial BCC under oral vismodegib therapy and a corresponding nonlesional epithelial skin biopsy were harvested. Total RNA was isolated, after which a sequencing cDNA library was prepared, and cluster generation was carried out, which was followed by an ultra-high-throughput miRNA sequencing analysis to indicate the read number of miRNA expression based on miRBase 21. In addition to the identification of differentially expressed miRNAs from RNA sequencing data, additional novel miRNA candidates were determined with a tool for identifying new miRNA sequences (miRDeep2). Results: We identified 33 up-regulated miRNAs (fold change >= 2) and 39 potentially new miRNA candidates (miRDeep scores 0-43.6). A manual sequence analysis of the miRNA candidates on the genomic locus of chromosome 1 with provisional IDs of chr1_1913 and chr1_421 was further carried out and rated as promising (chr1_1913) and borderline (chr1_421). Histopathology revealed skip lesions in clinically healthy appearing skin at the tumor margins, which were the cause of seven re-excisions by micrographic controlled surgery to achieve tumor-free margins. Conclusion: miRNA sequencing revealed novel miRNA candidates that need to be further confirmed in functional Dicer knockout studies. Clinically, on the basis of our surgical experience described here, neoadjuvant vismodegib therapy in BCC appears to impede histopathologic evaluations with effects on surgical therapy. Thus, larger studies are necessary, but are not preferable at this time if other options are available.

12 1 - 50 of 78
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf