Change search
Refine search result
123 1 - 50 of 131
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Abraham, Mark
    et al.
    Apostolov, Rossen
    Barnoud, Jonathan
    Bauer, Paul
    Blau, Christian
    Bonvin, Alexandre M. J. J.
    Chavent, Matthieu
    Chodera, John
    Condic-Jurkic, Karmen
    Delemotte, Lucie
    Grubmueller, Helmut
    Howard, Rebecca J.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Jordan, E. Joseph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Lindahl, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). KTH Royal Institute of Technology, Sweden.
    Ollila, O. H. Samuli
    Selent, Jana
    Smith, Daniel G. A.
    Stansfeld, Phillip J.
    Tiemann, Johanna K. S.
    Trellet, Mikael
    Woods, Christopher
    Zhmurov, Artem
    Sharing Data from Molecular Simulations2019In: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 59, no 10, p. 4093-4099Article in journal (Refereed)
    Abstract [en]

    Given the need for modern researchers to produce open, reproducible scientific output, the lack of standards and best practices for sharing data and workflows used to produce and analyze molecular dynamics (MD) simulations has become an important issue in the field. There are now multiple well-established packages to perform molecular dynamics simulations, often highly tuned for exploiting specific classes of hardware, each with strong communities surrounding them, but with very limited interoperability/transferability options. Thus, the choice of the software package often dictates the workflow for both simulation production and analysis. The level of detail in documenting the workflows and analysis code varies greatly in published work, hindering reproducibility of the reported results and the ability for other researchers to build on these studies. An increasing number of researchers are motivated to make their data available, but many challenges remain in order to effectively share and reuse simulation data. To discuss these and other issues related to best practices in the field in general, we organized a workshop in November 2018 (https://bioexcel.eu/events/workshop-on-sharing-data-from-molecular-simulations/). Here, we present a brief overview of this workshop and topics discussed. We hope this effort will spark further conversation in the MD community to pave the way toward more open, interoperable, and reproducible outputs coming from research studies using MD simulations.

  • 2.
    Alexeyenko, Andrey
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Schmitt, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tjärnberg, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dmitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Comparative interactomics with Funcoup 2.02012In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 40, no D1, p. D821-D828Article in journal (Refereed)
    Abstract [en]

    FunCoup (http://FunCoup.sbc.su.se) is a database that maintains and visualizes global gene/protein networks of functional coupling that have been constructed by Bayesian integration of diverse high-throughput data. FunCoup achieves high coverage by orthology-based integration of data sources from different model organisms and from different platforms. We here present release 2.0 in which the data sources have been updated and the methodology has been refined. It contains a new data type Genetic Interaction, and three new species: chicken, dog and zebra fish. As FunCoup extensively transfers functional coupling information between species, the new input datasets have considerably improved both coverage and quality of the networks. The number of high-confidence network links has increased dramatically. For instance, the human network has more than eight times as many links above confidence 0.5 as the previous release. FunCoup provides facilities for analysing the conservation of subnetworks in multiple species. We here explain how to do comparative interactomics on the FunCoup website.

    Download full text (pdf)
    fulltext
  • 3. Allison, Timothy M.
    et al.
    Degiacomi, Matteo T.
    Marklund, Erik G.
    Jovine, Luca
    Elofsson, Arne
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Benesch, Justin L. P.
    Landreh, Michael
    Complementing machine learning-based structure predictions with native mass spectrometry2022In: Protein Science, ISSN 0961-8368, E-ISSN 1469-896X, Vol. 31, no 6, article id e4333Article in journal (Refereed)
    Abstract [en]

    The advent of machine learning-based structure prediction algorithms such as AlphaFold2 (AF2) and RoseTTa Fold have moved the generation of accurate structural models for the entire cellular protein machinery into the reach of the scientific community. However, structure predictions of protein complexes are based on user-provided input and may require experimental validation. Mass spectrometry (MS) is a versatile, time-effective tool that provides information on post-translational modifications, ligand interactions, conformational changes, and higher-order oligomerization. Using three protein systems, we show that native MS experiments can uncover structural features of ligand interactions, homology models, and point mutations that are undetectable by AF2 alone. We conclude that machine learning can be complemented with MS to yield more accurate structural models on a small and large scale.

  • 4. Berglund, Ann-Charlotte
    et al.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Östlund, Gabriel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    InParanoid 6: eukaryotic ortholog clusters with inparalogs2008In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 36, p. D263-D266Article in journal (Refereed)
    Abstract [en]

    The InParanoid eukaryotic ortholog database (http://InParanoid.sbc.su.se/) has been updated to version 6 and is now based on 35 species. We collected all available 'complete' eukaryotic proteomes and Escherichia coli, and calculated ortholog groups for all 595 species pairs using the InParanoid program. This resulted in 2 642 187 pairwise ortholog groups in total. The orthology-based species relations are presented in an orthophylogram. InParanoid clusters contain one or more orthologs from each of the two species. Multiple orthologs in the same species, i.e. inparalogs, result from gene duplications after the species divergence. A new InParanoid website has been developed which is optimized for speed both for users and for updating the system. The XML output format has been improved for efficient processing of the InParanoid ortholog clusters.

  • 5. Bidkhori, Gholamreza
    et al.
    Narimani, Zahra
    Hosseini Ashtiani, Saman
    Moeini, Ali
    Nowzari-Dalini, Abbas
    Masoudi-Nejad, Ali
    Reconstruction of an Integrated Genome-Scale Co-Expression Network Reveals Key Modules Involved in Lung Adenocarcinoma2013In: PLOS ONE, E-ISSN 1932-6203, Vol. 8, no 7, p. e67552-e67552Article in journal (Other academic)
  • 6.
    Björklund, Åsa
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Creation of new proteins - domain rearrangements and tandem duplications2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Proteins are modular entities with domains as their building blocks. The domains are recurrent protein fragments with a distinct structure, function and evolutionary history. During evolution, proteins with new functions have been invented through rearrangements as well as differentiation of domains. The focus of this thesis is to gain better understanding of the processes that govern domain rearrangements. In particular, the rearrangements that create long protein domain repeats have been investigated in detail.

    We estimate that about 65% of the eukaryotic and 40% of the prokaryotic proteins are of the multidomain type. Further, we find that the eukaryotic multidomain proteins are mainly created through insertion of a single domain at the N- or C-terminus. However, domain repeats differ from other domain rearrangements in the aspect that they are created from internal tandem duplications. We show that such duplications often involve several domains simultaneously, and that different repeated domain families show distinct evolutionary patterns. Finally, we have investigated how large repeat regions are created using a specific example; the Actin binding nebulin domain. The analysis reveals several tandem duplications of both single nebulin domains and super repeats of seven nebulins in a number of vertebrates. We see that the duplication breakpoints vary between the species and that multiple duplications of the same region are common.

  • 7.
    Björklund, Åsa K.
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Light, Sara
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sagit, Rauan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Nebulin: A Study of Protein Repeat Evolution2010In: Journal of Molecular Biology, ISSN 0022-2836, E-ISSN 1089-8638, Vol. 402, no 1, p. 38-51Article in journal (Refereed)
    Abstract [en]

    Protein domain repeats are common in proteins that are central to the organization of a cell, in particular in eukaryotes. They are known to evolve through internal tandem duplications. However, the understanding of the underlying mechanisms is incomplete. To shed light on repeat expansion mechanisms, we have studied the evolution of the muscle protein Nebulin, a protein that contains a large number of actin-binding nebulin domains. Nebulin proteins have evolved from an invertebrate precursor containing two nebulin domains. Repeat regions have expanded through duplications of single domains, as well as duplications of a super repeat (SR) consisting of seven nebulins. We show that the SR has evolved independently into large regions in at least three instances: twice in the invertebrate Branchiostoma floridae and once in vertebrates. In-depth analysis reveals several recent tandem duplications in the Nebulin gene. The events involve both single-domain and multidomain SR units or several SR units. There are single events, but frequently the same unit is duplicated multiple times. For instance, an ancestor of human and chimpanzee underwent two tandem duplications. The duplication junction coincides with an Alu transposon, thus suggesting duplication through Alu-mediated homologous recombination. Duplications in the SR region consistently involve multiples of seven domains. However, the exact unit that is duplicated varies both between species and within species. Thus, multiple tandem duplications of the same motif did not create the large Nebulin protein. Finally, analysis of segmental duplications in the human genome reveals that duplications are more common in genes containing domain repeats than in those coding for nonrepeated proteins. In fact, segmental duplications are found three to six times more often in long repeated genes than expected by chance. 

    Download full text (pdf)
    Fulltext
  • 8.
    Bryant, Patrick
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Learning Protein Evolution and Structure2022Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    By analysing the structure of a protein it is possible to draw conclusions about its function. Obtaining the structure of a protein experimentally is however a time consuming and expensive process. By using evolution it is possible to infer the structure of a protein. AlphaFold2 (AF), the latest AI technology for protein structure prediction, uses evolutionary information to obtain protein structures in minutes instead of years at a fraction of the experimental cost. Here, we develop this technology further to predict the structure of interacting proteins. We create a confidence score, pDockQ, and show that this score rivals high-throughput experiments in distinguishing true and false protein-protein interactions (PPIs). Applying AF and the pDockQ score to a set of 65484 human PPIs we identify 1371 new high-confidence models. These models expand the structural knowledge of human protein complexes and can be used to e.g. develop new drugs or evaluate biological pathways. One limitation of AF is that the accuracy decreases with the number of proteins being predicted together and that the biggest protein complexes do not fit in the memory of the latest GPUs. To circumvent these issues, we predict subcomponents of protein complexes and assemble these together with Monte Carlo Tree search (MCTS). MCTS enables assembling some of the largest protein complexes using only sequence information and stoichiometry. Out of 175 protein complexes with 10-30 chains, 91 can be completely assembled with a median TM-score of 0.51. A third of these (30 complexes) are highly accurate (TM-score ≥0.8). The use of highly accurate protein structure prediction is revolutionising many fiends of biological research only one year after its realisation. Likely, this is only the beginning of a new era; the era of AI.  

    Download full text (pdf)
    Learning Protein Evolution and Structure
    Download (jpg)
    Omslagsframsida
  • 9.
    Bryant, Patrick
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting the structure of large proteincomplexes using AlphaFold and MonteCarlo tree searchManuscript (preprint) (Other academic)
    Abstract [en]

    AlphaFold can predict the structure of single- and multiple-chain proteins with very highaccuracy. However, the accuracy decreases with the number of chains, and the availableGPU memory limits the size of protein complexes which can be predicted. Here we showthat one can predict the structure of large complexes starting from predictions ofsubcomponents. We assemble 91 out of 175 complexes with 10-30 chains from predictedsubcomponents using Monte Carlo tree search, with a median TM-score of 0.51. There are30 highly accurate complexes (TM-score ≥0.8, 33% of complete assemblies). We create ascoring function, mpDockQ, that can distinguish if assemblies are complete and predict theiraccuracy. We find that complexes containing symmetry are accurately assembled, whileasymmetrical complexes remain challenging. The method is freely available and accesibleas a Colab notebookhttps://colab.research.google.com/github/patrickbryant1/MoLPC/blob/master/MoLPC.ipynb.

  • 10. Burke, David F.
    et al.
    Bryant, Patrick
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Barrio-Hernandez, Inigo
    Memon, Danish
    Pozzati, Gabriele
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shenoy, Aditi
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Zhu, Wensi
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Dunham, Alistair S.
    Albanese, Pascal
    Keller, Andrew
    Scheltema, Richard A.
    Bruce, James E.
    Leitner, Alexander
    Kundrotas, Petras
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). The University of Kansas, Lawrence, USA.
    Beltrao, Pedro
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Towards a structurally resolved human protein interaction network2023In: Nature Structural & Molecular Biology, ISSN 1545-9993, E-ISSN 1545-9985, Vol. 30, no 2, p. 216-225Article in journal (Refereed)
    Abstract [en]

    Cellular functions are governed by molecular machines that assemble through protein-protein interactions. Their atomic details are critical to studying their molecular mechanisms. However, fewer than 5% of hundreds of thousands of human protein interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human protein interactions. We show that experiments can orthogonally confirm higher-confidence models. We identify 3,137 high-confidence models, of which 1,371 have no homology to a known structure. We identify interface residues harboring disease mutations, suggesting potential mechanisms for pathogenic variants. Groups of interface phosphorylation sites show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple protein interactions as signaling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies helping to expand our understanding of human cell biology.

  • 11.
    Castresana Aguirre, Miguel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    From networks to pathway analysis2021Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Biological mechanisms stem from complex intracellular interactions spanning across different levels of regulation. Mapping these interactions is fundamental for the understanding of all types of biological conditions, including complex diseases. Each experimental approach carries its own bias and noise. Combining heterogeneous data sources reduces noise and gives a broader sense of the interactions between genes known as functional association, where both direct and indirect interactions are captured.

    FunCoup is one of the most comprehensive functional association databases, providing networks for 22 organisms in all domains of life. FunCoup uses a naïve Bayesian integration approach to combine 11 different data types and increases the coverage by transferring associations between species via orthologs. Additional insights into the mechanisms of a gene network are provided through tissue specificity filtering and directed regulatory links.

    Even though FunCoup provides a comprehensive map of the intracellular machinery, gaining insights into conditions such as diseases requires a functional level analysis rather than a gene level analysis. Thus, studying genes that are involved in a condition from a functional perspective requires the usage of pathway enrichment analysis. Several approaches exist, from basic gene overlap to more elaborate analyses that use functional association networks. ANUBIX is a novel network-based analysis (NBA) method that overcomes the high false positive rate issue that previous state-of-the-art NBA approaches have. Additionally, even with accurate methods, a commonly ignored problem is that gene sets derived from experiments are often noisy or contain multiple mechanisms, mixing different pathways which weakens their association to the condition under study. To increase the sensitivity of pathway analysis, we developed a pipeline to cluster gene sets into more homogeneous parts with the aim of unraveling all the mechanisms activated in the studied condition. To facilitate the usage of these tools, we built a web server called PathBIX, a user-friendly platform that allows interactive analysis of all species in FunCoup against multiple pathway databases.

    Download full text (pdf)
    From networks to pathway analysis
    Download (jpg)
    presentationsbild
  • 12.
    Castresana Aguirre, Miguel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Clustered Pathway AnalysisManuscript (preprint) (Other academic)
    Abstract [en]

    Motivation: Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each cluster.

    Results: We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering substantially increased the sensitivity of pathway analysis methods. For ANUBIX this came with almost no loss of specificity, while for BinoX and NEAT the specificity decreased roughly as much as the sensitivity increased. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We conclude that clustering can improve overall pathway annotation performance, but only if the used enrichment method has a low false positive rate. 

    Availability and Implementation: https://bitbucket.org/sonnhammergroup/clustering-and-pathway-enrichment/

  • 13.
    Castresana-Aguirre, Miguel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Persson, Emma
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    PathBIX—a web server for network-based pathway annotation with adaptive null models2021In: Bioinformatics Advances, E-ISSN 2635-0041, Vol. 1, no 1, article id vbab010Article in journal (Refereed)
    Abstract [en]

    Motivation: Pathway annotation is a vital tool for interpreting and giving meaning to experimental data in life sciences. Numerous tools exist for this task, where the most recent generation of pathway enrichment analysis tools, network-based methods, utilize biological networks to gain a richer source of information as a basis of the analysis than merely the gene content. Network-based methods use the network crosstalk between the query gene set and the genes in known pathways, and compare this to a null model of random expectation.

    Results: We developed PathBIX, a novel web application for network-based pathway analysis, based on the recently published ANUBIX algorithm which has been shown to be more accurate than previous network-based methods. The PathBIX website performs pathway annotation for 21 species, and utilizes prefetched and preprocessed network data from FunCoup 5.0 networks and pathway data from three databases: KEGG, Reactome, and WikiPathways.

    Download (pdf)
    PathBIX
  • 14. Cheng, Jianlin
    et al.
    Choe, Myong‐Ho
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Han, Kun-Sop
    Hou, Jie
    Maghrabi, Ali H. A.
    McGuffin, Liam J.
    Menéndez-Hurtado, David
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Olechnovič, Kliment
    Schwede, Torsten
    Studer, Gabriel
    Uziela, Karolis
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Venclovas, Česlovas
    Wallner, Björn
    Estimation of model accuracy in CASP132019In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 87, no 12, p. 1361-1377Article in journal (Refereed)
    Abstract [en]

    Methods to reliably estimate the accuracy of 3D models of proteins are both a fundamental part of most protein folding pipelines and important for reliable identification of the best models when multiple pipelines are used. Here, we describe the progress made from CASP12 to CASP13 in the field of estimation of model accuracy (EMA) as seen from the progress of the most successful methods in CASP13. We show small but clear progress, that is, several methods perform better than the best methods from CASP12 when tested on CASP13 EMA targets. Some progress is driven by applying deep learning and residue‐residue contacts to model accuracy prediction. We show that the best EMA methods select better models than the best servers in CASP13, but that there exists a great potential to improve this further. Also, according to the evaluation criteria based on local similarities, such as lDDT and CAD, it is now clear that single model accuracy methods perform relatively better than consensus‐based methods.

  • 15. Chicharro, Daniel
    et al.
    Ledberg, Anders
    Stockholm University, Faculty of Social Sciences, Centre for Social Research on Alcohol and Drugs (SoRAD). Universitat Pompeu Fabra, Spain.
    Framework to study dynamic dependencies in networks of interacting processes2012In: Physical Review E. Statistical, Nonlinear, and Soft Matter Physics, ISSN 1539-3755, E-ISSN 1550-2376, Vol. 86, no 4, article id 041901Article in journal (Refereed)
    Abstract [en]

    The analysis of dynamic dependencies in complex systems such as the brain helps to understand how emerging properties arise from interactions. Here we propose an information-theoretic framework to analyze the dynamic dependencies in multivariate time-evolving systems. This framework constitutes a fully multivariate extension and unification of previous approaches based on bivariate or conditional mutual information and Granger causality or transfer entropy. We define multi-information measures that allow us to study the global statistical structure of the system as a whole, the total dependence between subsystems, and the temporal statistical structure of each subsystem. We develop a stationary and a nonstationary formulation of the framework. We then examine different decompositions of these multi-information measures. The transfer entropy naturally appears as a term in some of these decompositions. This allows us to examine its properties not as an isolated measure of interdependence but in the context of the complete framework. More generally we use causal graphs to study the specificity and sensitivity of all the measures appearing in these decompositions to different sources of statistical dependence arising from the causal connections between the subsystems. We illustrate that there is no straightforward relation between the strength of specific connections and specific terms in the decompositions. Furthermore, causal and noncausal statistical dependencies are not separable. In particular, the transfer entropy can be nonmonotonic in dependence on the connectivity strength between subsystems and is also sensitive to internal changes of the subsystems, so it should not be interpreted as a measure of connectivity strength. Altogether, in comparison to an analysis based on single isolated measures of interdependence, this framework is more powerful to analyze emergent properties in multivariate systems and to characterize functionally relevant changes in the dynamics.

  • 16.
    Colding, Johan
    Stockholm University.
    Local institutions, biological conservation and management of ecosystem dynamics2001Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    This thesis analyze local institutions and management practices related to natural resources and ecosystem dynamics, with an emphasis on "traditional ecological knowledge" systems. Papers I, II and III analyze ‘resource and habitat taboos’ (RHTs) with the objective to synthesize knowledge about informal institutions behind resource management. Papers IV and V focus on resource management practices and social mechanisms with a capacity to confer resilience in ecosystems. Ecological resilience is the buffering capacity of ecosystems to incorporate disturbance and yet continue to provide biodiversity and ecological services critical to societal development. Cases for the synthesis were mainly derived from the literature. Examples of RHTs could be grouped in six different categories depending on their potential management and conservation functions. These included both use-taboos and non-use taboos. The former regulates access to, and methods and withdrawal of subsistence resources. These appear to be closely related to traditional ecological knowledge, as it is defined in this thesis. The latter prohibits human use of species and habitats, and is closely related to religious and cosmological belief systems. As discussed, both groups of taboos can be comparable to ethics of academic conservation biology, although rationales behind such ethics differ. RHTs have effects that may contribute to the conservation of habitats, local subsistence resources, and ‘threatened’, ‘endemic’ and ‘keystone’ species, although some may run contrary to conservation and notions of sustainability. It is asserted that under certain circumstances, RHTs, and possibly other types of informal institutions may offer advantages relative to formal measures of conservation. These benefits include non-costly, voluntary compliance features. Results of papers IV and V revealed that there exists a diversity of traditional practices for ecosystem management. These include multiple species management, resource rotation, ecological monitoring, succession management, landscape patchiness management, and practices of responding to and managing pulses and ecological surprises. Social mechanisms behind these practices included a number of adaptations for the generation, accumulation, and transmission of knowledge; dynamics of institutions; mechanisms for cultural internalization of traditional practices; and the development of appropriate world views and cultural values. These traditional systems had certain similarities to adaptive management with its emphasis on feedback learning, and its treatment of uncertainty and unpredictability to ecosystems. Furthermore, there existed practices that seem to reduce social-ecological crises in the events of large-scale natural disturbance. These included practices that create small-scale ecosystem renewal cycles, practices that spread risks, and practices for nurturing sources of ecosystem renewal. These practices are linked to social mechanisms such as flexible user rights and land tenure. It is concluded that ecological monitoring appears to be a key element in the development of many of the practices. Management practices in local communities are framed by a social context, with informal institutions and other social mechanisms, and supported by a worldview that does not de-couple people from their dependence on natural systems. Since management of ecosystems is associated with uncertainty about their spatial and temporal dynamics and due to incomplete knowledge about such dynamics, these practices may provide useful ‘rules of thumb’ for resource management with an ability to confer resilience and tighten environmental feedbacks of resource exploitation to local levels. To link local institutions in cross-scale polycentric co-management arrangements may be a viable option for improving current resource management systems.

  • 17. Corcoran, Martin M.
    et al.
    Phad, Ganesh E.
    Bernat, Nestor Vazquez
    Stahl-Hennig, Christiane
    Sumida, Noriyuki
    Persson, Mats A. A.
    Martin, Marcel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Hedestam, Gunilla B. Karlsson
    Production of individualized V gene databases reveals high levels of immunoglobulin genetic diversity2016In: nature communications, ISSN 2041-1723, Vol. 7, article id 13642Article in journal (Refereed)
    Abstract [en]

    Comprehensive knowledge of immunoglobulin genetics is required to advance our understanding of B cell biology. Validated immunoglobulin variable (V) gene databases are close to completion only for human and mouse. We present a novel computational approach, IgDiscover, that identifies germline V genes from expressed repertoires to a specificity of 100%. IgDiscover uses a cluster identification process to produce candidate sequences that, once filtered, results in individualized germline V gene databases. IgDiscover was tested in multiple species, validated by genomic cloning and cross library comparisons and produces comprehensive gene databases even where limited genomic sequence is available. IgDiscover analysis of the allelic content of the Indian and Chinese-origin rhesus macaques reveals high levels of immunoglobulin gene diversity in this species. Further, we describe a novel human IGHV3-21 allele and confirm significant gene differences between Balb/c and C57BL6 mouse strains, demonstrating the power of IgDiscover as a germline V gene discovery tool.

  • 18.
    Daume, Stefan
    et al.
    Stockholm University, Faculty of Science, Stockholm Resilience Centre. Georg-August-University Göttingen, Germany; Swedish Museum of Natural History, Sweden.
    Galaz, Victor
    Stockholm University, Faculty of Science, Stockholm Resilience Centre.
    Anyone Know What Species This Is? - Twitter Conversations as Embryonic Citizen Science Communities2016In: PLOS ONE, E-ISSN 1932-6203, Vol. 11, no 3, article id e0151387Article in journal (Refereed)
    Abstract [en]

    Social media like blogs, micro-blogs or social networks are increasingly being investigated and employed to detect and predict trends for not only social and physical phenomena, but also to capture environmental information. Here we argue that opportunistic biodiversity observations published through Twitter represent one promising and until now unexplored example of such data mining. As we elaborate, it can contribute to real-time information to traditional ecological monitoring programmes including those sourced via citizen science activities. Using Twitter data collected for a generic assessment of social media data in ecological monitoring we investigated a sample of what we denote biodiversity observations with species determination requests (N = 191). These entail images posted as messages on the micro-blog service Twitter. As we show, these frequently trigger conversations leading to taxonomic determinations of those observations. All analysed Tweets were posted with species determination requests, which generated replies for 64% of Tweets, 86% of those contained at least one suggested determination, of which 76% were assessed as correct. All posted observations included or linked to images with the overall image quality categorised as satisfactory or better for 81% of the sample and leading to taxonomic determinations at the species level in 71% of provided determinations. We claim that the original message authors and conversation participants can be viewed as implicit or embryonic citizen science communities which have to offer valuable contributions both as an opportunistic data source in ecological monitoring as well as potential active contributors to citizen science programmes.

  • 19. Didion, John P.
    et al.
    Martin, Marcel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Collins, Francis S.
    Atropos: specific, sensitive, and speedy trimming of sequencing reads2017In: PeerJ, E-ISSN 2167-8359, Vol. 5, article id e3720Article in journal (Refereed)
    Abstract [en]

    A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leadingedge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github. com/jdidion/atropos.

  • 20. Ekim, Baris
    et al.
    Sahlin, Kristoffer
    Stockholm University, Faculty of Science, Department of Mathematics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Medvedev, Paul
    Berger, Bonnie
    Chikhi, Rayan
    Efficient mapping of accurate long reads in minimizer space with mapquik2023In: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 33, no 7, p. 1188-1197Article in journal (Refereed)
    Abstract [en]

    DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. We focus on the critical problem of mapping, or aligning, low-divergence sequences from long reads (e.g., Pacific Biosciences [PacBio] HiFi) to a reference genome, which poses challenges in terms of accuracy and computational resources when using cutting-edge read mapping approaches that are designed for all types of alignments. A natural idea would be to optimize efficiency with longer seeds to reduce the probability of extraneous matches; however, contiguous exact seeds quickly reach a sensitivity limit. We introduce mapquik, a novel strategy that creates accurate longer seeds by anchoring alignments through matches of k consecutively sampled minimizers (k-min-mers) and only indexing k-min-mers that occur once in the reference genome, thereby unlocking ultrafast mapping while retaining high sensitivity. We show that mapquik significantly accelerates the seeding and chaining steps-fundamental bottlenecks to read mapping-for both the human and maize genomes with >96% sensitivity and near-perfect specificity. On the human genome, for both real and simulated reads, mapquik achieves a 37x speedup over the state-of-the-art tool minimap2, and on the maize genome, mapquik achieves a 410x speedup over minimap2, making mapquik the fastest mapper to date. These accelerations are enabled from not only minimizer-space seeding but also a novel heuristic O(n) pseudochaining algorithm, which improves upon the long-standing O(nlogn) bound. Minimizer-space computation builds the foundation for achieving real-time analysis of long-read sequencing data.

  • 21.
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    The relationship between orthology, protein domain architecture and protein function2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Lacking experimental data, protein function is often predicted from evolutionary and protein structure theory. Under the 'domain grammar' hypothesis the function of a protein follows from the domains it encodes. Under the 'orthology conjecture', orthologs, related through species formation, are expected to be more functionally similar than paralogs, which are homologs in the same or different species descended from a gene duplication event. However, these assumptions have not thus far been systematically evaluated.

    To test the 'domain grammar' hypothesis, we built models for predicting function from the domain combinations present in a protein, and demonstrated that multi-domain combinations imply functions that the individual domains do not. We also developed a novel gene-tree based method for reconstructing the evolutionary histories of domain architectures, to search for cases of architectures that have arisen multiple times in parallel, and found this to be more common than previously reported.

    To test the 'orthology conjecture', we first benchmarked methods for homology inference under the obfuscating influence of low-complexity regions, in order to improve the InParanoid orthology inference algorithm. InParanoid was then used to test the relative conservation of functionally relevant properties between orthologs and paralogs at various evolutionary distances, including intron positions, domain architectures, and Gene Ontology functional annotations.

    We found an increased conservation of domain architectures in orthologs relative to paralogs, in support of the 'orthology conjecture' and the 'domain grammar' hypotheses acting in tandem. However, equivalent analysis of Gene Ontology functional conservation yielded spurious results, which may be an artifact of species-specific annotation biases in functional annotation databases. I discuss possible ways of circumventing this bias so the 'orthology conjecture' can be tested more conclusively.

    Download full text (pdf)
    fulltext
  • 22.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Henricson, Anna
    Hollich, Volker
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain tree-based analysis of protein architecture evolution2008In: Molecular biology and evolution, ISSN 0737-4038, E-ISSN 1537-1719, Vol. 25, no 2, p. 254-264Article in journal (Refereed)
    Abstract [en]

    Understanding the dynamics behind domain architecture evolution is of great importance to unravel the functions of proteins. Complex architectures have been created throughout evolution by rearrangement and duplication events. An interesting question is how many times a particular architecture has been created, a form of convergent evolution or domain architecture reinvention. Previous studies have approached this issue by comparing architectures found in different species. We wanted to achieve a finer-grained analysis by reconstructing protein architectures on complete domain trees. The prevalence of domain architecture reinvention in 96 genomes was investigated with a novel domain tree-based method that uses maximum parsimony for inferring ancestral protein architectures. Domain architectures were taken from Pfam. To ensure robustness, we applied the method to bootstrap trees and only considered results with strong statistical support. We detected multiple origins for 12.4% of the scored architectures. In a much smaller data set, the subset of completely domain-assigned proteins, the figure was 5.6%. These results indicate that domain architecture reinvention is a much more common phenomenon than previously thought. We also determined which domains are most frequent in multiply created architectures and assessed whether specific functions could be attributed to them. However, no strong functional bias was found in architectures with multiple origins.

  • 23.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Pekkari, Isabella
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain architecture conservation in orthologs2011In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 12, p. 326-Article in journal (Refereed)
    Abstract [en]

    Background. As orthologous proteins are expected to retain function more often than other homologs, they are often used for functional annotation transfer between species. However, ortholog identification methods do not take into account changes in domain architecture, which are likely to modify a protein's function. By domain architecture we refer to the sequential arrangement of domains along a protein sequence. To assess the level of domain architecture conservation among orthologs, we carried out a large-scale study of such events between human and 40 other species spanning the entire evolutionary range. We designed a score to measure domain architecture similarity and used it to analyze differences in domain architecture conservation between orthologs and paralogs relative to the conservation of primary sequence. We also statistically characterized the extents of different types of domain swapping events across pairs of orthologs and paralogs.

    Results. The analysis shows that orthologs exhibit greater domain architecture conservation than paralogous homologs, even when differences in average sequence divergence are compensated for, for homologs that have diverged beyond a certain threshold. We interpret this as an indication of a stronger selective pressure on orthologs than paralogs to retain the domain architecture required for the proteins to perform a specific function. In general, orthologs as well as the closest paralogous homologs have very similar domain architectures, even at large evolutionary separation. The most common domain architecture changes observed in both ortholog and paralog pairs involved insertion/deletion of new domains, while domain shuffling and segment duplication/deletion were very infrequent.

    Conclusions. On the whole, our results support the hypothesis that function conservation between orthologs demands higher domain architecture conservation than other types of homologs, relative to primary sequence conservation. This supports the notion that orthologs are functionally more similar than other types of homologs at the same evolutionary distance.

  • 24.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Swedish e-Science Research Center .
    Evolution of Protein Domain Architectures2012In: Evolutionary Genomics: Statistical and Computational Methods, Vol 2 / [ed] Anisimova, M, Totowa, NJ: Humana Press, 2012, p. 187-216Chapter in book (Refereed)
    Abstract [en]

    This chapter reviews the current research on how protein domain architectures evolve. We begin by summarizing work on the phylogenetic distribution of proteins, as this directly impacts which domain architectures can be formed in different species. Studies relating domain family size to occurrence have shown that they generally follow power law distributions, both within genomes and larger evolutionary groups. These findings were subsequently extended to multidomain architectures. Genome evolution models that have been suggested to explain the shape of these distributions arc reviewed, as well as evidence for selective pressure to expand certain domain families more than others. Each domain has an intrinsic combinatorial propensity, and the effects of this have been studied using measures of domain versatility or promiscuity. Next, we study the principles of protein domain architecture evolution and how these have been inferred from distributions of extant domain arrangements. Following this, we review inferences of ancestral domain architecture and the conclusions concerning domain architecture evolution mechanisms that can be drawn from these. Finally, we examine whether all known cases of a given domain architecture can be assumed to have a single common origin (monophyly) or have evolved convergently (polyphyly).

  • 25.
    Forslund, Kristoffer
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L.L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting protein function from domain content2008In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 24, no 15, p. 1681-1687Article in journal (Refereed)
    Abstract [en]

    MOTIVATION: Computational assignment of protein function may be the single most vital application of bioinformatics in the post-genome era. These assignments are made based on various protein features, where one is the presence of identifiable domains. The relationship between protein domain content and function is important to investigate, to understand how domain combinations encode complex functions.

    RESULTS: Two different models are presented on how protein domain combinations yield specific functions: one rule-based and one probabilistic. We demonstrate how these are useful for Gene Ontology annotation transfer. The first is an intuitive generalization of the Pfam2GO mapping, and detects cases of strict functional implications of sets of domains. The second uses a probabilistic model to represent the relationship between domain content and annotation terms, and was found to be better suited for incomplete training sets. We implemented these models as predictors of Gene Ontology functional annotation terms. Both predictors were more accurate than conventional best BLAST-hit annotation transfer and more sensitive than a single-domain model on a large-scale dataset. We present a number of cases where combinations of Pfam-A protein domains predict functional terms that do not follow from the individual domains.

    AVAILABILITY: Scripts and documentation are available for download at http://sonnhammer.sbc.su.se/multipfam2go_source_docs.tar

  • 26.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    MGclus: network clustering employing shared neighbors2013In: Molecular BioSystems, ISSN 1742-206X, Vol. 9, no 7, p. 1670-1675Article in journal (Refereed)
    Abstract [en]

    Network analysis is an important tool for functional annotation of genes and proteins. A common approach to discern structure in a global network is to infer network clusters, or modules, and assume a functional coherence within each module, which may represent a complex or a pathway. It is however not trivial to define optimal modules. Although many methods have been proposed, it is unclear which methods perform best in general. It seems that most methods produce far from optimal results but in different ways. MGclus is a new algorithm designed to detect modules with a strongly interconnected neighborhood in large scale biological interaction networks. In our benchmarks we found MGclus to outperform other methods when applied to random graphs with varying degree of noise, and to perform equally or better when applied to biological protein interaction networks. MGclus is implemented in Java and utilizes the JGraphT graph library. It has an easy to use command-line interface and is available for download from http://sonnhammer.sbc.su.se/download/software/MGclus/.

    Download full text (pdf)
    fulltext
  • 27.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Mank, Judith E.
    Alexeyenko, Andrey
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Network Analysis of Functional Genomics Data: Application to Avian Sex-Biased Gene Expression2012In: Scientific World Journal, E-ISSN 1537-744X, p. 130491-Article in journal (Refereed)
    Abstract [en]

    Gene expression analysis is often used to investigate the molecular and functional underpinnings of a phenotype. However, differential expression of individual genes is limited in that it does not consider how the genes interact with each other in networks. To address this shortcoming we propose a number of network-based analyses that give additional functional insights into the studied process. These were applied to a dataset of sex-specific gene expression in the chicken gonad and brain at different developmental stages. We first constructed a global chicken interaction network. Combining the network with the expression data showed that most sex-biased genes tend to have lower network connectivity, that is, act within local network environments, although some interesting exceptions were found. Genes of the same sex bias were generally more strongly connected with each other than expected. We further studied the fates of duplicated sex-biased genes and found that there is a significant trend to keep the same pattern of sex bias after duplication. We also identified sex-biased modules in the network, which reveal pathways or complexes involved in sex-specific processes. Altogether, this work integrates evolutionary genomics with systems biology in a novel way, offering new insights into the modular nature of sex-biased genes.

  • 28.
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm, Bioinformatics Center, Science for Life Laboratory.
    Functional association networks for disease gene prediction2017Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Mapping of the human genome has been instrumental in understanding diseasescaused by changes in single genes. However, disease mechanisms involvingmultiple genes have proven to be much more elusive. Their complexityemerges from interactions of intracellular molecules and makes them immuneto the traditional reductionist approach. Only by modelling this complexinteraction pattern using networks is it possible to understand the emergentproperties that give rise to diseases.The overarching term used to describe both physical and indirect interactionsinvolved in the same functions is functional association. FunCoup is oneof the most comprehensive networks of functional association. It uses a naïveBayesian approach to integrate high-throughput experimental evidence of intracellularinteractions in humans and multiple model organisms. In the firstupdate, both the coverage and the quality of the interactions, were increasedand a feature for comparing interactions across species was added. The latestupdate involved a complete overhaul of all data sources, including a refinementof the training data and addition of new class and sources of interactionsas well as six new species.Disease-specific changes in genes can be identified using high-throughputgenome-wide studies of patients and healthy individuals. To understand theunderlying mechanisms that produce these changes, they can be mapped tocollections of genes with known functions, such as pathways. BinoX wasdeveloped to map altered genes to pathways using the topology of FunCoup.This approach combined with a new random model for comparison enables BinoXto outperform traditional gene-overlap-based methods and other networkbasedtechniques.Results from high-throughput experiments are challenged by noise and biases,resulting in many false positives. Statistical attempts to correct for thesechallenges have led to a reduction in coverage. Both limitations can be remediedusing prioritisation tools such as MaxLink, which ranks genes using guiltby association in the context of a functional association network. MaxLink’salgorithm was generalised to work with any disease phenotype and its statisticalfoundation was strengthened. MaxLink’s predictions were validatedexperimentally using FRET.The availability of prioritisation tools without an appropriate way to comparethem makes it difficult to select the correct tool for a problem domain.A benchmark to assess performance of prioritisation tools in terms of theirability to generalise to new data was developed. FunCoup was used for prioritisationwhile testing was done using cross-validation of terms derived fromGene Ontology. This resulted in a robust and unbiased benchmark for evaluationof current and future prioritisation tools. Surprisingly, previously superiortools based on global network structure were shown to be inferior to a localnetwork-based tool when performance was analysed on the most relevant partof the output, i.e. the top ranked genes.This thesis demonstrates how a network that models the intricate biologyof the cell can contribute with valuable insights for researchers that study diseaseswith complex genetic origins. The developed tools will help the researchcommunity to understand the underlying causes of such diseases and discovernew treatment targets. The robust way to benchmark such tools will help researchersto select the proper tool for their problem domain.

    Download full text (pdf)
    Functional association networks for disease gene prediction
    Download (jpg)
    Omslagsframsida
  • 29.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bernhem, Kristoffer
    Ait Blal, Hammou
    Lundberg, Emma
    Brismar, Hjalmar
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Experimental validation of predicted cancer genes using FRETManuscript (preprint) (Other academic)
  • 30.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm Bioinformatics Centre, Sweden; Swedish eScience Research Center, Sweden.
    MaxLink: network-based prioritization of genes tightly linked to a disease seed set2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 18, p. 2689-2690Article in journal (Refereed)
    Abstract [en]

    A Summary: MaxLink, a guilt-by-association network search algorithm, has been made available as a web resource and a stand-alone version. Based on a user-supplied list of query genes, MaxLink identifies and ranks genes that are tightly linked to the query list. This functionality can be used to predict potential disease genes from an initial set of genes with known association to a disease. The original algorithm, used to identify and rank novel genes potentially involved in cancer, has been updated to use a more statistically sound method for selection of candidate genes and made applicable to other areas than cancer. The algorithm has also been made faster by re-implementation in C + +, and the Web site uses FunCoup 3.0 as the underlying network.

  • 31.
    Guala, Dimitri
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    A large-scale benchmark of gene prioritization methods2017In: Scientific Reports, E-ISSN 2045-2322, Vol. 7, article id 46598Article in journal (Refereed)
    Abstract [en]

    In order to maximize the use of results from high-throughput experimental studies, e.g. GWAS, for identification and diagnostics of new disease-associated genes, it is important to have properly analyzed and benchmarked gene prioritization tools. While prospective benchmarks are underpowered to provide statistically significant results in their attempt to differentiate the performance of gene prioritization tools, a strategy for retrospective benchmarking has been missing, and new tools usually only provide internal validations. The Gene Ontology (GO) contains genes clustered around annotation terms. This intrinsic property of GO can be utilized in construction of robust benchmarks, objective to the problem domain. We demonstrate how this can be achieved for network-based gene prioritization tools, utilizing the FunCoup network. We use cross-validation and a set of appropriate performance measures to compare state-of-the-art gene prioritization algorithms: three based on network diffusion, NetRank and two implementations of Random Walk with Restart, and MaxLink that utilizes network neighborhood. Our benchmark suite provides a systematic and objective way to compare the multitude of available and future gene prioritization tools, enabling researchers to select the best gene prioritization tool for the task at hand, and helping to guide the development of more accurate methods.

  • 32.
    Hedlund, Johanna
    et al.
    Stockholm University, Faculty of Science, Stockholm Resilience Centre.
    Bodin, Örjan
    Stockholm University, Faculty of Science, Stockholm Resilience Centre.
    Nohrstedt, Daniel
    Policy issue interdependency and the formation of collaborative networks2021In: People and Nature, E-ISSN 2575-8314, Vol. 3, no 1, p. 236-250Article in journal (Refereed)
    Abstract [en]

    1. Environmental problems often span a set of challenges that each may engage different policy actors across different policy domains. These challenges, or policy issues, nonetheless exhibit interdependencies that may constrain the ability of actors to work together towards joint solutions.

    2. Still, we have limited knowledge about whether and how policy issue interdependencies actually shape how actors collaborate.

    3. Using data derived from two venues for collaborative water governance in the Norrstrom basin, Sweden, we investigate whether and how policy issues and policy issue interdependencies influence actors' selection of collaborative partners. We test two alternative sets of propositions; one set assumes that partner selection is driven by actors' engagement in policy issues and their interdependencies, while the other set emphasises social positions and actor attributes.

    4. Our results show that in one venue, actors' choices of collaborative partner were associated with factors from both sets, but not with policy issue interdependencies specifically. In the other venue, only actor and relational attributes shaped social tie formation. These results suggest that how actors interact does not necessarily align with the policy issues and the policy issue interdependencies defined by the environmental problem they are to address.

    5. Our results provide an important step towards arriving at evidence-based recommendations for more effective collaborative efforts in addressing complex environmental problems that no actor can address alone

  • 33.
    Hennerdal, Aron
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Rapid membrane protein topology prediction2011In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 27, no 9, p. 1322-1323Article in journal (Refereed)
    Abstract [en]

    State-of-the-art methods for topology of α-helical membrane proteins are based on the use of time-consuming multiple sequence alignments obtained from PSI-BLAST or other sources. Here, we examine if it is possible to use the consensus of topology prediction methods that are based on single sequences to obtain a similar accuracy as the more accurate multiple sequence-based methods. Here, we show that TOPCONS-single performs better than any of the other topology prediction methods tested here, but ~6% worse than the best method that is utilizing multiple sequence alignments. AVAILABILITY AND IMPLEMENTATION: TOPCONS-single is available as a web server from http://single.topcons.net/ and is also included for local installation from the web site. In addition, consensus-based topology predictions for the entire international protein index (IPI) is available from the web server and will be updated at regular intervals.

    Download full text (pdf)
    Fulltext
  • 34.
    Hennerdal, Aron
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Tsirigos, Konstantinos
    A guideline to α-helical membrane protein topology predictionManuscript (preprint) (Other academic)
    Abstract [en]

    All living organisms have a “membrane proteome” that mainly consists of α-helical mem- brane proteins containing one or more TM-helices. Prediction methods have been extensively used to identify as well as to classify the topology of these proteins. For current state-of-the- art methods, the prediction of correct topology of membrane proteins has been reported to be above 80%. However, this performance has only been observed in small and possibly biased datasets. Here, we add four “genome-scale” datasets, including a recent large set of experimen- tally validated membrane proteins with glycosylation sites. This set is also used to examine whether the qualities of topology predictions hold and if any prediction methods perform con- sistently better than others. We find that methods utilizing multiple sequence alignments are overall superior to methods that do not. The best performance is obtained by TOPCONS, a consensus method which combines several of the other prediction methods. Further, we show that the accuracy is most likely lower in eukaryotes than for prokaryotic proteins as the agree- ment between the predictors is significantly lower there. Finally, we show that three related methods, Phobius, Phillius and PolyPhobius, that incorporate a specific signal peptide module are superior to all other methods at the task of distinguishing between membrane and non- membrane proteins.

  • 35.
    Henricson, Anna
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Forslund, Kristoffer
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Orthology confers intron position conservation2010In: BMC Genomics, E-ISSN 1471-2164, Vol. 11:412Article in journal (Refereed)
    Abstract [en]

    Background: With the wealth of genomic data available it has become increasingly important to assign putative protein function through functional transfer between orthologs. Therefore, correct elucidation of the evolutionary relationships among genes is a critical task, and attempts should be made to further improve the phylogenetic inference by adding relevant discriminating features. It has been shown that introns can maintain their position over long evolutionary timescales. For this reason, it could be possible to use conservation of intron positions as a discriminating factor when assigning orthology. Therefore, we wanted to investigate whether orthologs have a higher degree of intron position conservation (IPC) compared to non-orthologous sequences that are equally similar in sequence.

    Results: To this end, we developed a new score for IPC and applied it to ortholog groups between human and six other species. For comparison, we also gathered the closest non-orthologs, meaning sequences close in sequence space, yet falling just outside the ortholog cluster. We found that ortholog-ortholog gene pairs on average have a significantly higher degree of IPC compared to ortholog-closest non-ortholog pairs. Also pairs of inparalogs were found to have a higher IPC score than inparalog-closest non-inparalog pairs. We verified that these differences can not simply be attributed to the generally higher sequence identity of the ortholog-ortholog and the inparalog-inparalog pairs. Furthermore, we analyzed the agreement between IPC score and the ortholog score assigned by the InParanoid algorithm, and found that it was consistently high for all species comparisons. In a minority of cases, the IPC and InParanoid score ranked inparalogs differently. These represent cases where sequence and intron position divergence are discordant. We further analyzed the discordant clusters to identify any possible preference for protein functions by looking for enriched GO terms and Pfam protein domains. They were enriched for functions important for multicellularity, which implies a connection between shifts in intronic structure and the origin of multicellularity.

    Conclusions: We conclude that orthologous genes tend to have more conserved intron positions compared to non-orthologous genes. As a consequence, our IPC score is useful as an additional discriminating factor when assigning orthology.

  • 36.
    Herman, Pawel Andrzej
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Lundqvist, Mikael
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Nested theta to gamma oscillations and precise spatiotemporal firing during memory retrieval in a simulated attractor network2013In: Brain Research, ISSN 0006-8993, E-ISSN 1872-6240, Vol. 1536, no S1, p. 68-87Article in journal (Refereed)
    Abstract [en]

    Nested oscillations, where the phase of the underlying slow rhythm modulates the power of faster oscillations, have recently attracted considerable research attention as the increased phase-coupling of cross-frequency oscillations has been shown to relate to memory processes. Here we investigate the hypothesis that reactivations of memory patterns, induced by either external stimuli or internal dynamics, are manifested as distributed cell assemblies oscillating at gamma-like frequencies with life-times on a theta scale. For this purpose, we study the spatiotemporal oscillatory dynamics of a previously developed meso-scale attractor network model as a correlate of its memory function. The focus is on a hierarchical nested organization of neural oscillations in delta/theta (2–5 Hz) and gamma frequency bands (25–35 Hz), and in some conditions even in lower alpha band (8–12 Hz), which emerge in the synthesized field potentials during attractor memory retrieval. We also examine spiking behavior of the network in close relation to oscillations. Despite highly irregular firing during memory retrieval and random connectivity within each cell assembly, we observe precise spatiotemporal firing patterns that repeat across memory activations at a rate higher than expected from random firing. In contrast to earlier studies aimed at modeling neural oscillations, our attractor memory network allows us to elaborate on the functional context of emerging rhythms and discuss their relevance. We provide support for the hypothesis that the dynamics of coherent delta/theta oscillations constitute an important aspect of the formation and replay of neuronal assemblies.

  • 37.
    Hillerton, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    In silico modelling for refining gene regulatory network inference2023Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Gene regulation is at the centre of all cellular functions, regulating the cell's healthy and pathological responses. The interconnected system of regulatory interactions is known as the gene regulatory network (GRN), where genes influence each other to maintain strict and robust control. Today a large number of methods exist for inferring GRNs, which necessitates benchmarking to determine which method is most suitable for a specific goal. Paper I presents such a benchmark focusing on the effect of using known perturbations to infer GRNs. 

    A further challenge when studying GRNs is that experimental data contains high levels of noise and that artefacts may be introduced by the experiment itself. The LSCON method was developed in paper II to reduce the effect of one such artefact that can occur if the expression of a gene shows no or minimal change across most or all experiments. 

     With few fully determined biological GRNs available, it is problematic to use these to evaluate an inference method's correctness. Instead, the GRN field relies on simulated data, using a known GRN and generating the corresponding data. When simulating GRNs, capturing the topological properties of the biological GRN is vital. The FFLatt algorithm was developed in paper III to create scale-free, feed-forward loop motif-enriched GRNs, capturing two of the most prominent topological features in biological GRNs. 

     Once a high-quality GRN is obtained, the next step is to simulate gene expression data corresponding to the GRN. In paper IV, building on the FFLatt method, an open-source Python simulation tool called GeneSNAKE was developed to generate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties and improves on previous tools by featuring a variety of perturbation schemes along with the ability to control noise and modify the perturbation strength.

    Download full text (pdf)
    In silico modelling for refining gene regulatory network inference
    Download (jpg)
    presentationsbild
  • 38.
    Hillerton, Thomas
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Erik K., Zhivkoplias
    Stockholm University, Faculty of Science, Stockholm Resilience Centre.
    Garbulowski, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sonnhammer, Erik L. L.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    GeneSNAKE: a Python package for benchmarking and simulation of gene regulatory networks and expression data.Manuscript (preprint) (Other academic)
    Abstract [en]

    Understanding how genes interact with and regulate each other is a key challenge in systems biology. One of the primary methods to study this is through gene regulatory networks (GRNs). The field of GRN inference however faces many challenges, such as the complexity of gene regulation and high noise levels, which necessitates effective tools for evaluating inference methods. For this purpose, data that corresponds to a known GRN, from various conditions and experimental setups is necessary, which is only possible to attain via simulation.  Existing tools for simulating data for GRN inference have limitations either in the way networks are constructed or data is produced, and are often not flexible for adjusting the algorithm or parameters. 

    To overcome these issues we present GeneSNAKE, a Python package designed to allow users to generate biologically realistic GRNs, and from a GRN simulate expression data for benchmarking purposes. GeneSNAKE allows the user to control a wide range of network and data properties. GeneSNAKE improves on previous work in the field by adding a perturbation model that allows for a greater range of perturbation schemes along with the ability to control noise and modify the perturbation strength. 

    For benchmarking, GeneSNAKE offers a number of functions both for comparing a true GRN to an inferred GRN, and to study properties in data and GRN models. These functions can in addition be used to study properties of biological data to produce simulated data with more realistic properties.  GeneSNAKE is an open-source, comprehensive simulation and benchmarking package with powerful capabilities that are not combined in any other single package, and thanks to the Python implementation it is simple to extend and modify by a user.

  • 39.
    Hosseini Ashtiani, Saman
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Omics Data Analysis of Complex Diseases and Traits2022Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Following the advent of the high-throughput techniques for producing massive omics data, new possibilities and challenges have also emerged in different fields of biology and medicine. Dealing with such data on different scales with different scopes such as genomics, transcriptomics, proteomics and metabolomics, demands appropriate data collection, preprocessing, statistical analysis, interpretation and visualization. The overall goal of this thesis was to conceive omics-related questions in the context of four research titles and to apply a rational choice of the mentioned methods to conduct the study plans to answer them. 

    Paper I asks whether we could propose potentially implicated genes in psoriasis; and tries to answer it using microarray transcriptomics data of psoriasis. Initially, quality control was performed on the microarray dataset and then the Differentially Expressed Genes (DEGs) were chosen for mapping to a protein-protein interaction (PPI) database to create a subnetwork of the respective PPI. Using network analysis, genes with higher scores were proposed as potentially relevant to psoriasis and finally, we evaluated the results concerning a gene-disease association database. 

    Paper II asks whether the knockout of two genes followed by a transformation in E. coli could lead to an increase in bacterial growth in two different media; and deals with it through in vitro experiments followed by an in silico analysis of E. coli RNA-seq data. Here, we calculated the pairwise correlations between each target (knockout) gene and the rest of the genes in the RNA-seq dataset. Then, the significantly anti-correlated genes were shown to mainly belong to protein biosynthesis pathways compared to all other background pathways, which might indicate an increase in protein biosynthesis-related genes' transcription levels when there is an absolute decrease (knockout) in each of the target genes. 

    Paper III asks if an anti-bone-resorption drug called Denosumab significantly affects the abundance of the metabolites extracted from blood samples during a two-year longitudinal placebo-controlled clinical trial study; and tries to address this through running statistical hypothesis testing for each metabolite in the quantification data from Liquid Chromatography-Mass Spectrometry (LC-MS). Afterwards, the patterns of metabolites' variations concerning Denosumab administration and visit times were studied using Principal Component Analysis (PCA), association studies and Hierarchical clustering. The results of this study proposed some identified metabolites for further clinical investigations. Based on our analyses, the patterns of abundance variations in some of the identified metabolites could be considered for improving the corresponding clinical studies and treatment with Denosumab. 

    Paper IV proposes potentially relevant genes in lung adenocarcinoma by constructing a genome-scale co-expression network followed by clustering. The genes in each cluster were studied using the literature knowledge. One of the most frequently reported genes in lung adenocarcinoma was EGFR. We reported all the first-neighborhood genes connected to EFGR in its corresponding module as potentially relevant to lung adenocarcinoma. 

    The repertoire of the above choices, workflows and evaluations could be applicable for further follow-up studies at different levels including omics data integration, personalized omics data analysis, studies on different scales such as cellular or tissue, using other methodologies for the same questions and running benchmarks. Although four different omics-related questions were posed in this thesis, they all involved the selection or preparation of the respective omics data, choosing preprocessing strategies, choosing statistical analyses and hypothesis testing methods and finally, performing the evaluation of the results and interpretations.

    Download full text (pdf)
    Omics Data Analysis of Complex Diseases and Traits
    Download (jpg)
    presentationsbild
  • 40.
    Hosseini Ashtiani, Saman
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Razavipour, Roya
    Akhavan Sepahi, Abbas
    Mohammad Hossein, Modarresi
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Bambai, Bijan
    FDH knockout and TsFDH transformation led to enhanced growth rate of Escherichia coliManuscript (preprint) (Other academic)
    Abstract [en]

    Background: 

    Increased Atmospheric CO2 to over 400 ppm has prompted global climate irregularities. Reducing the released CO2 from biotechnological processes could remediate these phenomena. In this study, we sought to reduce the released CO2 into the atmosphere from bacterial growth by reducing formic acid conversion into CO2. Since E. coli is the biotechnological workhorse and its higher growth rate is desirable, another goal was to monitor the bacterial biomass after the metabolic engineering. 

    Results: 

    The biochemical conversion of formic acid to CO2 is a key reaction. Therefore, we compared the growth of control strains K12 and BL21, alongside two strains (in which two different genes coding two formate dehydrogenase (FDH) subunits were deleted) in complex and simple media. Our observations demonstrated that the knockout bacteria significantly grew more efficiently than the controls in both media. TsFDH, an FDH with moderately more catalytic efficiency, in contrast to other known FDHs for converting CO2 to formate, increased the growth of both knockouts compared with the controls and the knockouts without TsFDH. This difference was more accentuated in M9+Glycerol. Through a transcriptomics-level in silico analysis of the knockout genes, RNA-seq-based correlation outcome revealed that the genes negatively correlated with the target genes (knockout genes) belong to tRNA-related pathways. 

    Conclusion: 

    Observing higher cell biomass for the knockout and transformed strains at equal concentrations of carbon source in both media indicates possible underlying mechanisms leading to reduced carbon leakage and increased carbon assimilation, which need more detailed investigations. These results may also provide a phenotypic-level clue for the inconsistency of predictions in previous metabolic models that declared glycerol as a suitable carbon source for the growth of E. coli but failed to achieve it in practice. Gene expression correlations and pathway analysis outcomes suggested possible over-expression of the genes involved in tRNA processing and charging pathways. 

  • 41. Hu, Rui-Si
    et al.
    Zhang, Xiao-Xuan
    Ma, Qiao-Ni
    Elsheikha, Hany M.
    Ehsan, Muhammad
    Zhao, Quan
    Fromm, Bastian
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    Zhu, Xing-Quan
    Differential expression of microRNAs and tRNA fragments mediate the adaptation of the liver fluke Fasciola gigantica to its intermediate snail and definitive mammalian hosts2021In: International Journal of Parasitology, ISSN 0020-7519, E-ISSN 1879-0135, Vol. 51, no 5, p. 405-414Article in journal (Refereed)
    Abstract [en]

    The tropical liver fluke Fasciola gigantica affects livestock and humans in many Asian countries, large parts of Africa, and parts of Europe. Despite the public health and economic impacts of F. gigantica, understanding of F. gigantica biology and how the complex lifecycle of this liver fluke is transcriptionally regulated remain unknown. Here, we tested the hypothesis that the regulatory small non-coding RNAs (sncRNAs), microRNAs (miRNAs) and tRNA-derived fragments (tRFs) play roles in the adaptation of F. gigantica to its intermediate and definitive hosts. We sequenced sncRNAs of eight lifecycle stages of F. gigantica. In total, 56 miRNAs from 33 conserved families and four Fasciola-specific miRNAs were identified. Expression analysis of miRNAs suggested clear stage-related patterns. By leveraging the existing transcriptomic data, we predicted a miRNA-based regulation of metabolism, transport, growth and developmental processes. Also, by comparing miRNA complement of F. gigantica with that of Fasciola hepatica, we detected a high level of conservation and identified differences in some miRNAs, which can be used to distinguish the two species. Moreover, we found that tRFs at each lifecycle stage were predominantly derived by tRNA-Lys and tRNA-Gly at 50 half sites, but relatively high expression was related to the buffalo-infecting stages. Taken together, we provided a comprehensive overview of the dynamic transcriptional changes of small RNAs that occur during the developmental stages of F. gigantica. This global analysis of F. gigantica lifecycle stages revealed new roles of miRNAs and tRFs in parasite development and will facilitate future research into understanding of fasciolosis pathobiology.

  • 42. Höjer, Pontus
    et al.
    Frick, Tobias
    Siga, Humam
    Pourbozorgi, Parham
    Aghelpasand, Hooman
    Martin, Marcel
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Ahmadian, Afshin
    BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies2023In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 51, no 22, article id e114Article in journal (Refereed)
    Abstract [en]

    Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.

  • 43.
    Kang, Wenjing
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    microRNAs: from biogenesis to organismal tracing2020Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    MicroRNAs (miRNAs) are short noncoding RNAs of around 22 nucleotides in length, which help to shape the expression of most mRNAs. Perturbation of miRNA expression has revealed a variety of defects in development, cell specification, physiology and behavior. This thesis focuses on two topics of miRNA: identification of structural features that influence miRNA biogenesis (Paper I) and application of taxonomical marker miRNAs to resolve organismal origin of samples (Paper II and III).

    The current model of miRNA hairpin biogenesis has limited information content and appears to be incomplete. In paper I, we apply a novel high-throughput screening method to profile the optimal structure of miRNA hairpins for efficient and precise miRNA biogenesis. The optimal structure consists of tight and loose local structures across the hairpin, which reflects the constraints of biogenesis proteins. We find that miRNA hairpins with stable lower basal stem are more efficiently processed and have a higher expression level in tissues of 20 animal species. We address that the structural features - which have been largely neglected in the current model - are in fact as important as the well-known sequence motifs.

    New miRNAs are continuously added over evolutionary time and are rarely secondarily lost, making them ideal taxonomical markers. In paper II, we demonstrate as a proof-of-principle that miRNAs can be used to trace biological sample back to the lineage or even species of origin. Based on the marker miRNAs, we develop miRTrace, the first software to accurately trace miRNA sequences back to their taxonomical origin. The method can sensitively identify the origin of single cells and detect parasitic nematode RNA in mammalian host blood sample. In paper III, we apply miRNA tracing to address a controversial question about the origin of the exogenous plant miRNAs (xenomiRs) found in human samples, and which have been proposed to regulate human gene expression. Our computational and experimental results provide evidence that xenomiRs are derived from technical artifacts rather than dietary intake.

    Download full text (pdf)
    microRNAs: from biogenesis to organismal tracing
    Download (jpg)
    presentationsbild
    Download (pdf)
    Errata
  • 44. Kang, Yanlei
    et al.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Jiang, Yunliang
    Huang, Weihong
    Yu, Minzhe
    Li, Zhong
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Huzhou University, China; Zhejiang Sci-Tech University, China.
    AFTGAN: prediction of multi-type PPI based on attention free transformer and graph attention network2023In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 39, no 2, article id btad052Article in journal (Refereed)
    Abstract [en]

    Motivation: Protein–protein interaction (PPI) networks and transcriptional regulatory networks are critical in regulating cells and their signaling. A thorough understanding of PPIs can provide more insights into cellular physiology at normal and disease states. Although numerous methods have been proposed to predict PPIs, it is still challenging for interaction prediction between unknown proteins. In this study, a novel neural network named AFTGAN was constructed to predict multi-type PPIs. Regarding feature input, ESM-1b embedding containing much biological information for proteins was added as a protein sequence feature besides amino acid co-occurrence similarity and one-hot coding. An ensemble network was also constructed based on a transformer encoder containing an AFT module (performing the weight operation on vital protein sequence feature information) and graph attention network (extracting the relational features of protein pairs) for the part of the network framework.

    Results: The experimental results showed that the Micro-F1 of the AFTGAN based on three partitioning schemes (BFS, DFS and the random mode) on the SHS27K and SHS148K datasets was 0.685, 0.711 and 0.867, as well as 0.745, 0.819 and 0.920, respectively, all higher than that of other popular methods. In addition, the experimental comparisons confirmed the performance superiority of the proposed model for predicting PPIs of unknown proteins on the STRING dataset.

    Availability and implementation: The source code is publicly available at https://github.com/1075793472/AFTGAN.

    Supplementary information: Supplementary data are available at Bioinformatics online.

  • 45. Ke, Rongqin
    et al.
    Mignardi, Marco
    Hauling, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nilsson, Mats
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Fourth Generation of Next-Generation Sequencing Technologies: Promise and Consequences2016In: Human Mutation, ISSN 1059-7794, E-ISSN 1098-1004, Vol. 37, no 12, p. 1363-1367Article, review/survey (Refereed)
    Abstract [en]

    In this review, we discuss the emergence of the fourth-generation sequencing technologies that preserve the spatial coordinates of RNA and DNA sequences with up to subcellular resolution, thus enabling back mapping of sequencing reads to the original histological context. This information is used, for example, in two current large-scale projects that aim to unravel the function of the brain. Also in cancer research, fourth-generation sequencing has the potential to revolutionize the field. Cancer Research UK has named Mapping the molecular and cellular tumor microenvironment in order to define new targets for therapy and prognosis one of the grand challenges in tumor biology. We discuss the advantages of sequencing nucleic acids directly in fixed cells over traditional next-generation sequencing (NGS) methods, the limitations and challenges that these new methods have to face to become broadly applicable, and the impact that the information generated by the combination of in situ sequencing and NGS methods will have in research and diagnostics.

  • 46. Kenah, Eben
    et al.
    Britton, Tom
    Stockholm University, Faculty of Science, Department of Mathematics.
    Halloran, M. Elizabeth
    Longini, Ira M.
    Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees2016In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 12, no 4Article in journal (Refereed)
    Abstract [en]

    Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology.

  • 47.
    Kim, Sea-Yong
    Stockholm University, Faculty of Science, Department of Ecology, Environment and Plant Sciences.
    The neurotoxin β-N-methylamino-L-alanine (BMAA) and 2,4-diaminobutyric acid (DAB): possible risk of human exposure, and the effect and function in diatoms2022Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The toxic secondary metabolites β-N-methylamino-L-alanine (BMAA) and 2,4-diaminobutyric acid (DAB) produced by phytoplankton groups such as cyanobacteria, diatoms and dinoflagellates are known to cause neurotoxicity in vertebrates. BMAA has been linked to development of the neurodegenerative diseases amyotrophic lateral sclerosis/Parkinsonism dementia complex (ALS/PDC) and Alzheimer's disease. Despite these risks, previous studies have focused mostly on food webs in aquatic ecosystems as a possible source of human exposure to BMAA and DAB. Moreover, most studies in regard to the producer of BMAA and DAB are biased towards cyanobacteria.

    The first aim of this thesis was to investigate the possible risk of human exposure to BMAA via the agro-aqua cycle that artificially interconnects agriculture and aquaculture. Two groups of commercial chickens, fed on either standard feed or standard feed mixed with blue mussel meat, were investigated. The results show that BMAA can be transferred to and accumulated in the chickens through the mixed fodder. It has been suggested that the consumption of chicken may cause a risk of human exposure to BMAA if the chickens are fed with the fodder mixed with mussel meat (Paper I).

    The second aim was to assess the effect of biotic stresses (i.e. predation, competition) as possible causative factors to regulate the production of BMAA and/or DAB in diatoms, and assess the toxic effect of BMAA and/or DAB on predator and competitor (if specific production patterns occur for either toxin). The production of DAB was specially regulated only in the diatom T. pseudonana as responses to the predation and the competition. The toxic effect of DAB was significant on the population growth of the copepod Tigriopus sp. as predator, and the growth of cell numbers in T. pseudonana as competitor. However, given the environmental relevance of the DAB effect, the results suggest that DAB may play an important role in the defense mechanisms of the diatom T. pseudonana (Paper II and III).

    The last aim was to study the effect and function of BMAA in the diatom Phaeodactylum tricornutum. P. tricornutum was exposed to different concentrations of BMAA. The results showed concentration dependent responses to BMAA. The following were observed when the growth (i.e. cell number) of P. tricornutum was arrested due to exogenous BMAA; oxidative stress, reduced carbon fixation, increase in intracellular Chl a, alterations in GS-GOGAT, and suppressed urea cycle. The results suggest that BMAA represents a toxic secondary metabolite capable of controlling the growth of P. tricornutum via oxidative stress and alterations in the activity of photosynthesis and nitrogen metabolism (Paper IV).

    Download full text (pdf)
    The neurotoxin β-N-methylamino-L-alanine (BMAA) and 2,4-diaminobutyric acid (DAB)
    Download (jpg)
    presentationsbild
  • 48. Kurrikoff, Kaido
    et al.
    Veiman, Kadi-Liis
    Künnapuu, Kadri
    Peets, Elin Madli
    Lehto, Tõnis
    Stockholm University, Faculty of Science, Department of Neurochemistry.
    Pärnaste, Ly
    Arukuusk, Piret
    Langel, Ülo
    Stockholm University, Faculty of Science, Department of Neurochemistry. University of Tartu, Estonia.
    Effective in vivo gene delivery with reduced toxicity, achieved by charge and fatty acid -modified cell penetrating peptide2017In: Scientific Reports, E-ISSN 2045-2322, Vol. 7, article id 17056Article in journal (Refereed)
    Abstract [en]

    Non-viral gene delivery systems have gained considerable attention as a promising alternative to viral delivery to treat diseases associated with aberrant gene expression. However, regardless of extensive research, only a little is known about the parameters that underline in vivo use of the nanoparticle-based delivery vectors. The modest efficacy and low safety of non-viral delivery are the two central issues that need to be addressed. We have previously characterized an efficient cell penetrating peptide, PF14, for in vivo applications. In the current work, we first develop an optimized formulation of PF14/pDNA nanocomplexes, which allows removal of the side-effects without compromising the bioefficacy in vivo. Secondly, based on the physicochemical complex formation studies and biological efficacy assessments, we develop a series of PF14 modifications with altered charge and fatty acid content. We show that with an optimal combination of overall charge and hydrophobicity in the peptide backbone, in vivo gene delivery can be augmented. Further combined with the safe formulation, systemic gene delivery lacking any side effects can be achieved.

  • 49.
    Larsson, Per
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Prediction, modeling, and refinement of protein structure2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Accurate predictions of protein structure are important for understanding many processes in cells. The interactions that govern protein folding and structure are complex, and still far from completely understood. However, progress is being made in many areas. Here, efforts to improve the overall quality of protein structure models are described. From a pure evolutionary perspective, in which proteins are viewed in the light of gradually accumulated mutations on the sequence level, it is shown how information from multiple sources helps to create more accurate models. A very simple but surprisingly accurate method for assigning confidence measures for protein structures is also tested. In contrast to models based on evolution, physics based methods view protein structures as the result of physical interactions between atoms. Newly implemented methods are described that both increase the time-scales accessible for molecular dynamics simulations almost 10-fold, and that to some extent might be able to refine protein structures. Finally, I compare the efficiency and properties of different techniques for protein structure refinement.

  • 50.
    Larsson, Per
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Skwark, Marcin J.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Wallner, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Assessment of global and local model quality in CASP8 using Pcons and ProQ2009In: Proteins: Structure, Function, and Bioinformatics, ISSN 0887-3585, E-ISSN 1097-0134, Vol. 77, no 9, p. 167-172Article in journal (Refereed)
    Abstract [en]

    Model Quality Assessment Programs (MQAPs) are programs developed to rank protein models. These methods can be trained to predict the overall global quality of a model or what local regions in a model that are likely to be incorrect. In CASP8, we participated with two predictors that predict both global and local quality using either consensus information, Pcons, or purely structural information, ProQ. Consistently with results in previous CASPs, the best performance in CASP8 was obtained using the Pcons method. Furthermore, the results show that the modification introduced into Pcons for CASP8 improved the predictions against GDT_TS and now a correlation coefficient above 0.9 is achieved, whereas the correlation for ProQ is about 0.7. The correlation is better for the easier than for the harder targets, but it is not below 0.5 for a single target and below 0.7 only for three targets. The correlation coefficient for the best local quality MQAP is 0.68 showing that there is still clear room for improvement within this area. We also detect that Pcons still is not always able to identify the best model. However, we show that using a linear combination of Pcons and ProQ it is possible to select models that are better than the models from the best single server. In particular, the average quality over the hard targets increases by about 6% compared with using Pcons alone.

123 1 - 50 of 131
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf