Change search
Refine search result
123 1 - 50 of 127
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Aktürk, Şevva
    et al.
    Mapelli, Igor
    Güler, Merve N.
    Gürün, Kanat
    Katırcıoğlu, Büşra
    Vural, Kıvılcım Başak
    Sağlıcan, Ekin
    Çetin, Mehmet
    Yaka, Reyhan
    Stockholm University, Faculty of Humanities, Department of Archaeology and Classical Studies. Middle East Technical University, Turkey; Centre for Palaeogenetics, Sweden.
    Sürer, Elif
    Atağ, Gözde
    Çokoğlu, Sevim Seda
    Sevkar, Arda
    Altınışık, N. Ezgi
    Koptekin, Dilek
    Somel, Mehmet
    Benchmarking kinship estimation tools for ancient genomes using pedigree simulations2024In: Molecular Ecology Resources, ISSN 1755-098X, E-ISSN 1755-0998Article in journal (Refereed)
    Abstract [en]

    There is growing interest in uncovering genetic kinship patterns in past societies using low-coverage palaeogenomes. Here, we benchmark four tools for kinship estimation with such data: lcMLkin, NgsRelate, KIN, and READ, which differ in their input, IBD estimation methods, and statistical approaches. We used pedigree and ancient genome sequence simulations to evaluate these tools when only a limited number (1 to 50 K, with minor allele frequency ≥0.01) of shared SNPs are available. The performance of all four tools was comparable using ≥20 K SNPs. We found that first-degree related pairs can be accurately classified even with 1 K SNPs, with 85% F1 scores using READ and 96% using NgsRelate or lcMLkin. Distinguishing third-degree relatives from unrelated pairs or second-degree relatives was also possible with high accuracy (F1 > 90%) with 5 K SNPs using NgsRelate and lcMLkin, while READ and KIN showed lower success (69 and 79% respectively). Meanwhile, noise in population allele frequencies and inbreeding (first-cousin mating) led to deviations in kinship coefficients, with different sensitivities across tools. We conclude that using multiple tools in parallel might be an effective approach to achieve robust estimates on ultra-low-coverage genomes. 

  • 2. Ali, Raja Hashim
    et al.
    Muhammad, Sayyed Auwn
    Khan, Mehmood Alam
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden .
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 14, no Suppl,15, p. S12-Article in journal (Refereed)
    Abstract [en]

    Background

    Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

    Results

    Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

    Conclusions

    The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 3. Allison, Timothy M.
    et al.
    Degiacomi, Matteo T.
    Marklund, Erik G.
    Jovine, Luca
    Elofsson, Arne
    Stockholm University, Science for Life Laboratory (SciLifeLab). Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Benesch, Justin L. P.
    Landreh, Michael
    Complementing machine learning-based structure predictions with native mass spectrometry2022In: Protein Science, ISSN 0961-8368, E-ISSN 1469-896X, Vol. 31, no 6, article id e4333Article in journal (Refereed)
    Abstract [en]

    The advent of machine learning-based structure prediction algorithms such as AlphaFold2 (AF2) and RoseTTa Fold have moved the generation of accurate structural models for the entire cellular protein machinery into the reach of the scientific community. However, structure predictions of protein complexes are based on user-provided input and may require experimental validation. Mass spectrometry (MS) is a versatile, time-effective tool that provides information on post-translational modifications, ligand interactions, conformational changes, and higher-order oligomerization. Using three protein systems, we show that native MS experiments can uncover structural features of ligand interactions, homology models, and point mutations that are undetectable by AF2 alone. We conclude that machine learning can be complemented with MS to yield more accurate structural models on a small and large scale.

  • 4. Almagro Armenteros, Jose Juan
    et al.
    Salvatore, Marco
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Emanuelsson, Olof
    Winther, Ole
    von Heijne, Gunnar
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Nielsen, Henrik
    Detecting Novel Sequence Signals in Targeting Peptides Using Deep LearningManuscript (preprint) (Other academic)
  • 5.
    Arvestad, Lars
    Stockholm University, Faculty of Science, Department of Mathematics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-science Research Centre, Sweden.
    alv: a console-based viewer for molecular sequence alignments2018In: Journal of Open Source Software, E-ISSN 2475-9066, Vol. 3, no 31, article id 955Article in journal (Refereed)
    Download full text (pdf)
    fulltext
  • 6.
    Basile, Walter
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sachenkova, Oxana
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Light, Sara
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Linköping University, Sweden.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Kungliga Tekniska Högskolan, Sweden.
    High GC content causes orphan proteins to be intrinsically disordered2017In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, no 3, article id e1005375Article in journal (Refereed)
    Abstract [en]

    De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

  • 7.
    Basile, Walter
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Salvatore, Marco
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Bassot, Claudio
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center (SeRC), Sweden.
    Why do eukaryotic proteins contain more intrinsically disordered regions?2019In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 15, no 7, article id e1007186Article in journal (Refereed)
    Abstract [en]

    Intrinsic disorder is more abundant in eukaryotic than prokaryotic proteins. Methods predicting intrinsic disorder are based on the amino acid sequence of a protein. Therefore, there must exist an underlying difference in the sequences between eukaryotic and prokaryotic proteins causing the (predicted) difference in intrinsic disorder. By comparing proteins, from complete eukaryotic and prokaryotic proteomes, we show that the difference in intrinsic disorder emerges from the linker regions connecting Pfam domains. Eukaryotic proteins have more extended linker regions, and in addition, the eukaryotic linkers are significantly more disordered, 38% vs. 12-16% disordered residues. Next, we examined the underlying reason for the increase in disorder in eukaryotic linkers, and we found that the changes in abundance of only three amino acids cause the increase. Eukaryotic proteins contain 8.6% serine; while prokaryotic proteins have 6.5%, eukaryotic proteins also contain 5.4% proline and 5.3% isoleucine compared with 4.0% proline and ≈ 7.5% isoleucine in the prokaryotes. All these three differences contribute to the increased disorder in eukaryotic proteins. It is tempting to speculate that the increase in serine frequencies in eukaryotes is related to regulation by kinases, but direct evidence for this is lacking. The differences are observed in all phyla, protein families, structural regions and type of protein but are most pronounced in disordered and linker regions. The observation that differences in the abundance of three amino acids cause the difference in disorder between eukaryotic and prokaryotic proteins raises the question: Are amino acid frequencies different in eukaryotic linkers because the linkers are more disordered or do the differences cause the increased disorder?

  • 8.
    Bernsel, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sequence-based predictions of membrane-protein topology, homology and insertion2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Membrane proteins comprise around 20-30% of a typical proteome and play crucial roles in a wide variety of biochemical pathways. Apart from their general biological significance, membrane proteins are of particular interest to the pharmaceutical industry, being targets for more than half of all available drugs. This thesis focuses on prediction methods for membrane proteins that ultimately rely on their amino acid sequence only.

    By identifying soluble protein domains in membrane protein sequences, we were able to constrain and improve prediction of membrane protein topology, i.e. what parts of the sequence span the membrane and what parts are located on the cytoplasmic and extra-cytoplasmic sides. Using predicted topology as input to a profile-profile based alignment protocol, we managed to increase sensitivity to detect distant membrane protein homologs.

    Finally, experimental measurements of the level of membrane integration of systematically designed transmembrane helices in vitro were used to derive a scale of position-specific contributions to helix insertion efficiency for all 20 naturally occurring amino acids. Notably, position within the helix was found to be an important factor for the contribution to helix insertion efficiency for polar and charged amino acids, reflecting the highly anisotropic environment of the membrane. Using the scale to predict natural transmembrane helices in protein sequences revealed that, whereas helices in single-spanning proteins are typically hydrophobic enough to insert by themselves, a large part of the helices in multi-spanning proteins seem to require stabilizing helix-helix interactions for proper membrane integration. Implementing the scale to predict full transmembrane topologies yielded results comparable to the best statistics-based topology prediction methods.

    Download full text (pdf)
    FULLTEXT01
  • 9.
    Berthet, Pierre
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Computational Modeling of the Basal Ganglia: Functional Pathways and Reinforcement Learning2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    We perceive the environment via sensor arrays and interact with it through motor outputs. The work of this thesis concerns how the brain selects actions given the information about the perceived state of the world and how it learns and adapts these selections to changes in this environment. Reinforcement learning theories suggest that an action will be more or less likely to be selected if the outcome has been better or worse than expected. A group of subcortical structures, the basal ganglia (BG), is critically involved in both the selection and the reward prediction.

    We developed and investigated a computational model of the BG. We implemented a Bayesian-Hebbian learning rule, which computes the weights between two units based on the probability of their activations. We were able test how various configurations of the represented pathways impacted the performance in several reinforcement learning and conditioning tasks. Then, following the development of a more biologically plausible version with spiking neurons, we simulated lesions in the different pathways and assessed how they affected learning and selection.

    We observed that the evolution of the weights and the performance of the models resembled qualitatively experimental data. The absence of an unique best way to configure the model over all the learning paradigms tested indicates that an agent could dynamically configure its action selection mode, mainly by including or not the reward prediction values in the selection process. We present hypotheses on possible biological substrates for the reward prediction pathway. We base these on the functional requirements for successful learning and on an analysis of the experimental data. We further simulate a loss of dopaminergic neurons similar to that reported in Parkinson’s disease. We suggest that the associated motor symptoms are mostly causedby an impairment of the pathway promoting actions, while the pathway suppressing them seems to remain functional.

    Download full text (pdf)
    fulltext
    Download (jpg)
    omslagsframsida
  • 10.
    Berthet, Pierre
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Lindahl, Mikael
    Tully, Philip
    Hellgren-Kotaleski, Jeanette
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Functional relevance of different basal ganglia pathways investigated in a spiking model with reward dependent plasticityManuscript (preprint) (Other academic)
  • 11.
    Bjelkmar, Pär
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Modeling of voltage-gated ion channels2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The recent determination of several crystal structures of voltage-gated ion channels has catalyzed computational efforts of studying these remarkable molecular machines that are able to conduct ions across biological membranes at extremely high rates without compromising the ion selectivity.

    Starting from the open crystal structures, we have studied the gating mechanism of these channels by molecular modeling techniques. Firstly, by applying a membrane potential, initial stages of the closing of the channel were captured, manifested in a secondary-structure change in the voltage-sensor. In a follow-up study, we found that the energetic cost of translocating this 310-helix conformation was significantly lower than in the original conformation. Thirdly, collaborators of ours identified new molecular constraints for different states along the gating pathway. We used those to build new protein models that were evaluated by simulations. All these results point to a gating mechanism where the S4 helix undergoes a secondary structure transformation during gating.

    These simulations also provide information about how the protein interacts with the surrounding membrane. In particular, we found that lipid molecules close to the protein diffuse together with it, forming a large dynamic lipid-protein cluster. This has important consequences for the understanding of protein-membrane interactions and for the theories of lateral diffusion of membrane proteins.

    Further, simulations of the simple ion channel antiamoebin were performed where different molecular models of the channel were evaluated by calculating ion conduction rates, which were compared to experimentally measured values. One of the models had a conductance consistent with the experimental data and was proposed to represent the biological active state of the channel.

    Finally, the underlying methods for simulating molecular systems were probed by implementing the CHARMM force field into the GROMACS simulation package. The implementation was verified and specific GROMACS-features were combined with CHARMM and evaluated on long timescales. The CHARMM interaction potential was found to sample relevant protein conformations indifferently of the model of solvent used.

    Download full text (pdf)
    fulltext
  • 12.
    Björkholm, Patrik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Protein Interactions from the Molecular to the Domain Level2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The basic unit of life is the cell, from single-cell bacteria to the largest creatures on the planet. All cells have DNA, which contains the blueprint for proteins. This information is transported in the form of messenger RNA from the genome to ribosomes where proteins are produced. Proteins are the main functional constituents of the cell, they usually have one or several functions and are the main actors in almost all essential biological processes. Proteins are what make the cell alive. Proteins are found as solitary units or as part of large complexes. Proteins can be found in all parts of the cell, the most common place being the cytoplasm, a central space in all cells. They are also commonly found integrated into or attached to various membranes.

    Membranes define the cell architecture. Proteins integrated into the membrane have a wide number of responsibilities: they are the gatekeepers of the cell, they secrete cellular waste products, and many of them are receptors and enzymes.

    The main focus of this thesis is the study of protein interactions, from the molecular level up to the protein domain level.

    In paper I use reoccurring local protein structures to try and predict what sections of a protein interacts with another part using only sequence information. In papers II and III we use a randomization approach on a membrane protein motif that we know interacts with a sphingomyelin lipid to find other candidate proteins that interact with sphingolipids. These are then experimentally verified as sphingolipid-binding. In the last paper, paper IV, we look at how protein domain interaction networks overlap and can be evaluated.

    Download full text (pdf)
    fulltext
  • 13.
    Björkholm, Patrik
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Ernst, Andreas
    Hacke, Moritz
    Wieland, Felix
    Brügger, Britta
    von Heijne, Gunnar
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Identification of novel sphingolipid-binding motifs in mammalian membrane proteinsManuscript (preprint) (Other academic)
    Abstract [en]

    Specific interactions between transmembrane proteins and sphingolipids is a poorly understood phenomenon, and only a couple of instances have been identified. The best characterized example is the sphingolipid-binding motif VXXTLXXIY found in the transmembrane helix of the vesicular transport protein p24. Here, we have used a simple motif- probability algorithm (MOPRO) to identify proteins that contain putative sphingolipid-binding motifs in a dataset comprising full proteomes from mammalian organisms. Four selected candidate proteins all tested positive for sphingolipid binding in a photoaffinity assay. The putative sphingolipid-binding motifs are noticeably enriched in the 7TM family of G-protein coupled receptors, predominantly in transmembrane helix 6. 

  • 14.
    Caster, Ola
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Uppsala Monitoring Centre, Sweden.
    Norén, G. Niklas
    Stockholm University, Faculty of Science, Department of Mathematics. Uppsala Monitoring Centre, Sweden.
    Edwards, I. Ralph
    Computing limits on medicine risks based on collections of individual case reports2014In: Theoretical Biology and Medical Modelling, E-ISSN 1742-4682, Vol. 11, article id 15Article in journal (Refereed)
    Abstract [en]

    Background: Quantifying a medicine's risks for adverse effects is crucial in assessing its value as a therapeutic agent. Rare adverse effects are often not detected until after the medicine is marketed and used in large and heterogeneous patient populations, and risk quantification is even more difficult. While individual case reports of suspected harm from medicines are instrumental in the detection of previously unknown adverse effects, they are currently not used for risk quantification. The aim of this article is to demonstrate how and when limits on medicine risks can be computed from collections of individual case reports. Methods: We propose a model where drug exposures in the real world may be followed by adverse episodes, each containing one or several adverse effects. Any adverse episode can be reported at most once, and each report corresponds to a single adverse episode. Based on this model, we derive upper and lower limits for the per-exposure risk of an adverse effect for a given drug. Results: An upper limit for the per-exposure risk of the adverse effect Y for a given drug X is provided by the reporting ratio of X together with Y relative to all reports on X, under two assumptions: (i) the average number of adverse episodes following exposure to X is one or less; and (ii) adverse episodes that follow X and contain Y are more frequently reported than adverse episodes in general that follow X. Further, a lower risk limit is provided by dividing the number of reports on X together with Y by the total number of exposures to X, under the assumption that exposures to X that are followed by Y generate on average at most one report on X together with Y. Using real data, limits for the narcolepsy risk following Pandemrix vaccination and the risk of coeliac disease following antihypertensive treatment were computed and found to conform to reference risk values from epidemiological studies. Conclusions: Our framework enables quantification of medicine risks in situations where this is otherwise difficult or impossible. It has wide applicability, but should be particularly useful in structured benefit-risk assessments that include rare adverse effects.

  • 15.
    Caster, Ola
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Uppsala Monitoring Centre, WHO Collaborating Centre for International Drug Monitoring, Sweden.
    Sandberg, Lovisa
    Bergvall, Tomas
    Watson, Sarah
    Noren, G. Niklas
    vigiRank for statistical signal detection in pharmacovigilance: First results from prospective real-world use2017In: Pharmacoepidemiology and Drug Safety, ISSN 1053-8569, E-ISSN 1099-1557, Vol. 26, no 8, p. 1006-1010Article in journal (Refereed)
    Abstract [en]

    Purpose: vigiRank is a data-driven predictive model for emerging safety signals. In addition to disproportionate reporting patterns, it also accounts for the completeness, recency, and geographic spread of individual case reporting, as well as the availability of case narratives. Previous retrospective analysis suggested that vigiRank performed better than disproportionality analysis alone. The purpose of the present analysis was to evaluate its prospective performance. Methods: The evaluation of vigiRank was based on real-world signal detection in VigiBase. In May 2014, vigiRank scores were computed for pairs of new drugs and WHO Adverse Reaction Terminology critical terms with at most 30 reports from at least 2 countries. Initial manual assessments were performed in order of descending score, selecting a subset of drug-adverse drug reaction pairs for in-depth expert assessment. The primary performance metric was the proportion of initial assessments that were decided signals during in-depth assessment. As comparator, the historical performance for disproportionality-guided signal detection in VigiBase was computed from a corresponding cohort of drug-adverse drug reaction pairs assessed between 2009 and 2013. During this period, the requirement for initial manual assessment was a positive lower endpoint of the 95% credibility interval of the Information Component measure of disproportionality, observed for the first time. Results: 194 initial assessments suggested by vigiRank's ordering eventually resulted in 6 (3.1%) signals. Disproportionality analysis yielded 19 signals from 1592 initial assessments (1.2%; P <.05). Conclusions: Combining multiple strength-of-evidence aspects as in vigiRank significantly outperformed disproportionality analysis alone in real-world pharmacovigilance signal detection, for VigiBase.

  • 16. Centler, Florian
    et al.
    Guennigmann, Sarah
    Fetzer, Ingo
    Stockholm University, Faculty of Science, Stockholm Resilience Centre.
    Wendeberg, Annelie
    Keystone Species and Modularity in Microbial Hydrocarbon Degradation Uncovered by Network Analysis and Association Rule Mining2020In: Microorganisms, E-ISSN 2076-2607, Vol. 8, no 2, article id 190Article in journal (Refereed)
    Abstract [en]

    Natural microbial communities in soils are highly diverse, allowing for rich networks of microbial interactions to unfold. Identifying key players in these networks is difficult as the distribution of microbial diversity at the local scale is typically non-uniform, and is the outcome of both abiotic environmental factors and microbial interactions. Here, using spatially resolved microbial presence-absence data along an aquifer transect contaminated with hydrocarbons, we combined co-occurrence analysis with association rule mining to identify potential keystone species along the hydrocarbon degradation process. Derived co-occurrence networks were found to be of a modular structure, with modules being associated with specific spatial locations and metabolic activity along the contamination plume. Association rules identify species that never occur without another, hence identifying potential one-sided cross-feeding relationships. We find that hub nodes in the rule network appearing in many rules as targets qualify as potential keystone species that catalyze critical transformation steps and are able to interact with varying partners. By contrasting analysis based on data derived from bulk samples and individual soil particles, we highlight the importance of spatial sample resolution. While individual inferred interactions are hypothetical in nature, requiring experimental verification, the observed global network patterns provide a unique first glimpse at the complex interaction networks at work in the microbial world.

  • 17. Cáceres, Manuel
    et al.
    Mumey, Brendan
    Husić, Edin
    Rizzi, Romeo
    Cairo, Massimo
    Sahlin, Kristoffer
    Stockholm University, Faculty of Science, Department of Mathematics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tomescu, Alexandru I.
    Safety in Multi-Assembly via Paths Appearing in All Path Covers of a DAG2022In: IEEE/ACM Transactions on Computational Biology & Bioinformatics, ISSN 1545-5963, E-ISSN 1557-9964, Vol. 19, no 6, p. 3673-3684Article in journal (Refereed)
    Abstract [en]

    A multi-assembly problem asks to reconstruct multiple genomic sequences from mixed reads sequenced from all of them. Standard formulations of such problems model a solution as a path cover in a directed acyclic graph, namely a set of paths that together cover all vertices of the graph. Since multi-assembly problems admit multiple solutions in practice, we consider an approach commonly used in standard genome assembly: output only partial solutions ( contigs , or safe paths ), that appear in all path cover solutions. We study constrained path covers, a restriction on the path cover solution that incorporate practical constraints arising in multi-assembly problems. We give efficient algorithms finding all maximal safe paths for constrained path covers. We compute the safe paths of splicing graphs constructed from transcript annotations of different species. Our algorithms run in less than 15 seconds per species and report RNA contigs that are over 99% precise and are up to 8 times longer than unitigs. Moreover, RNA contigs cover over 70% of the transcripts and their coding sequences in most cases. With their increased length to unitigs, high precision, and fast construction time, maximal safe paths can provide a better base set of sequences for transcript assembly programs.

  • 18.
    Djurfeldt, Mikael
    et al.
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Lundqvist, Mikael
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Johansson, Christopher
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Rehn, Martin
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Ekeberg, Örjan
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Lansner, Anders
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Brain-scale simulation of the neocortex on the IBM Blue Gene/L  supercomputer2008In: IBM Journal of Research and Development, ISSN 0018-8646, E-ISSN 2151-8556, Vol. 52, no 1-2, p. 31-41Article in journal (Refereed)
    Abstract [en]

    Biologically detailed large-scale models of the brain can now be simulated thanks to increasingly powerful massively parallel supercomputers. We present an overview, for the general technical reader, of a neuronal network model of layers II/III of the neocortex built with biophysical model neurons. These simulations, carried out on an IBM Blue Gene/Le supercomputer, comprise up to 22 million neurons and 11 billion synapses, which makes them the largest simulations of this type ever performed. Such model sizes correspond to the cortex of a small mammal. The SPLIT library, used for these simulations, runs on single-processor as well as massively parallel machines. Performance measurements show good scaling behavior on the Blue Gene/L supercomputer up to 8,192 processors. Several key phenomena seen in the living brain appear as emergent phenomena in the simulations. We discuss the role of this kind of model in neuroscience and note that full-scale models may be necessary to preserve natural dynamics. We also discuss the need for software tools for the specification of models as well as for analysis and visualization of output data. Combining models that range from abstract connectionist type to biophysically detailed will help us unravel the basic principles underlying neocortical function.

  • 19. Duart, Gerard
    et al.
    Lamb, John
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.
    Mingarro, Ismael
    Intra-helical salt bridge contribution to membrane protein insertionManuscript (preprint) (Other academic)
    Abstract [en]

    Salt bridges between negatively (D, E) and positively charged (K, R, H) amino acids play an important role in protein stabilization. This has a more prevalent effect in membrane proteins where polar amino acids are exposed to a very hydrophobic environment. In transmembrane (TM) helices the presence of charged residues can hinder the insertion of the helices into the membrane. This can sometimes be avoided by TM region rearrangements after insertion, but it is also possible that the formation of salt bridges could decrease the cost of membrane integration. However, the presence of intra-helical salt bridges in TM domains and their effect on insertion has not been properly studied yet. In this work, we use an analytical pipeline to study the prevalence of charged pairs of amino acid residues in TM α-helices, which shows that potentially salt-bridge forming pairs are statistically over-represented. We then selected some candidates to experimentally determine the contribution of these electrostatic interactions to the translocon-assisted membrane insertion process. Using both in vitro and in vivo systems, we confirm the presence of intra-helical salt bridges in TM segments during biogenesis and determined that they contribute between 0.5-0.7 kcal/mol to the apparent free energy of membrane insertion (ΔGapp). Our observations suggest that salt bridge interactions can be stabilized during translocon-mediated insertion and thus could be relevant to consider for the future development of membrane protein prediction software.

  • 20.
    Ekman, Diana
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain rearrangement and creation in protein evolution2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Proteins are composed of domains, recurrent protein fragments with distinct structure, function and evolutionary history. Some domains exist only as single domain proteins, however, a majority of them are also combined with other domains. Domain rearrangements are important in the evolution of new proteins as new functionalities can arise in a single evolutionary event. In addition, the domain repertoire can be expanded through mutations of existing domains and de novo creation. The processes of domain rearrangement and creation have been the focus of this thesis.

    According to our estimates about 65% of the eukaryotic and 40% of the prokaryotic proteins are of multidomain type. We found that insertion of a single domain at the N- or C-terminus was the most common event in the creation of novel multidomain architectures. However, domain repeats deviate from this pattern and are often expanded through duplications of several domains. Next, by mapping domain combinations onto an evolutionary tree we estimated that roughly one domain architecture has been created per million years, with the highest rates in metazoa. Much of this so called explosion of new architectures in metazoa seems to be explained by a set of domains amenable to exon shuffling. In contrast to domain architectures, most known domain families evolved early. However, many proteins have incomplete domain coverage, and could hence contain de novo created domains. In Saccharomyces cerevisiae, however, species specific sequences constitute only a minor fraction of the proteome, and are often short, disordered sequences located at the protein termini.

  • 21.
    Eldfjell, Yrin
    Stockholm University, Faculty of Science, Department of Mathematics.
    Identifying Mitochondrial Genomes in Draft Whole-Genome Shotgun Assemblies of Six Gymnosperm Species2018Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
    Abstract [en]

    Sequencing efforts for gymnosperm genomes typically focus on nuclear and chloroplast DNA, with only three complete mitochondrial genomes published as of 2017. The availability of additional mitochondrial genomes would aid biological and evolutionary understanding of gymnosperms. Identifying mtDNA from existing whole genome sequencing (WGS) data (i.e. contigs) negates the need for additional experimental work but previous classification methods show limitations in sensitivity or accuracy, particularly in difficult cases. In this thesis I present a classification pipeline based on (1) kmer probability scoring and (2) SVM classification applied to the available contigs. Using this pipeline the mitochondrial genomes of six gymnosperm species were obtained: Abies sibirica, Gnetum gnemon, Juniperus communis, Picea abies, Pinus sylvestris and Taxus baccata. Cross-validation experiments showed a satisfying and forsome species excellent degree of accuracy.

    Download full text (pdf)
    fulltext
  • 22.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Progress at protein structure prediction, as seen in CASP152023In: Current opinion in structural biology, ISSN 0959-440X, E-ISSN 1879-033X, Vol. 80, article id 102594Article, review/survey (Refereed)
    Abstract [en]

    In Dec 2020, the results of AlphaFold version 2 were presented at CASP14, sparking a revolution in the field of protein structure predictions. For the first time, a purely computational method could challenge experimental accuracy for structure prediction of single protein domains. The code of AlphaFold v2 was released in the summer of 2021, and since then, it has been shown that it can be used to accurately predict the structure of most ordered proteins and many protein–protein interactions. It has also sparked an explosion of development in the field, improving AI-based methods to predict protein complexes, disordered regions, and protein design. Here I will review some of the inventions sparked by the release of AlphaFold.

  • 23. Emanuelsson, Olof
    et al.
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden.
    Käll, Lukas
    Engagera och aktivera studenter med inspiration från konferenser: examination genom poster-presentation2014In: Proceedings 2014: 8:e Pedagogiska inspirationskonferensen 17 december 2014, Lund: Lund University , 2014Conference paper (Refereed)
    Abstract [sv]

    I en forskningsnära kurs om 7.5 hp på master-nivå inom bioinformatikämnet vid KTH består drygt halva kursen av ett projekt som genomförs i grupper om tre studenter. Varje projekt har en egen projektuppgift med inget eller marginellt överlapp med andra gruppers uppgifter. Projekten är så gott som uteslutande baserade på aktuella frågeställningar i lärarteamets egna forskningsgrupper eller deras närhet. Projektet redovisas dels genom en posterpresentation, dels med individuell webbaserad projektdagbok. Vid posterredovisningen, som omfattar tre timmar i slutet av tentamensperioden, är alla kursdeltagare med. Vi försöker i möjligaste mån efterlikna situationen där ett autentiskt forskningsresultat presenteras på en riktig konferens. Varje deltagare (student) förväntas alltså ta del av varje annan grupps poster, på samma sätt som sker vid de flesta vetenskapliga konferenser. Vi genomför en enklare kamratbedömning på posternivå, där varje student ska avge en kort och konfidentiell kommentar om var och en av övriga postrar. Kursens lärare bedömer förstås också postrarna. En av svårigheterna är att sätta individuella betyg. Här använder vi oss av individuella projektdagböcker, som ger vägledning till de olika individernas insatser inom projektet. Vi har provat detta under fyra kursomgångar med som mest sju projekt. Examinationsformen är rolig och motiverande både för studenterna och lärarna.

  • 24. Eriksson, Johan
    et al.
    Vogel, Edward K.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). KTH Royal Institute of Technology, Sweden.
    Bergström, Fredrik
    Nyberg, Lars
    Neurocognitive Architecture of Working Memory2015In: Neuron, ISSN 0896-6273, E-ISSN 1097-4199, Vol. 88, no 1, p. 33-46Article, review/survey (Refereed)
    Abstract [en]

    A crucial role for working memory in temporary information processing and guidance of complex behavior has been recognized for many decades. There is emerging consensus that working-memory maintenance results from the interactions among long-term memory representations and basic processes, including attention, that are instantiated as reentrant loops between frontal and posterior cortical areas, as well as sub-cortical structures. The nature of such interactions can account for capacity limitations, lifespan changes, and restricted transfer after working-memory training. Recent data and models indicate that working memory may also be based on synaptic plasticity and that working memory can operate on non-consciously perceived information.

  • 25. Fiebig, Florian
    et al.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    A Spiking Working Memory Model Based on Hebbian Short-Term Potentiation2017In: Journal of Neuroscience, ISSN 0270-6474, E-ISSN 1529-2401, Vol. 37, no 1, p. 83-96Article in journal (Refereed)
    Abstract [en]

    A dominant theory of working memory (WM), referred to as the persistent activity hypothesis, holds that recurrently connected neural networks, presumably located in the prefrontal cortex, encode and maintain WM memory items through sustained elevated activity. Reexamination of experimental data has shown that prefrontal cortex activity in single units during delay periods is much more variable than predicted by such a theory and associated computational models. Alternative models of WM maintenance based on synaptic plasticity, such as short-term nonassociative (non-Hebbian) synaptic facilitation, have been suggested but cannot account for encoding of novel associations. Here we test the hypothesis that a recently identified fast-expressing form of Hebbian synaptic plasticity (associative short-term potentiation) is a possible mechanism for WM encoding and maintenance. Our simulations using a spiking neural network model of cortex reproduce a range of cognitive memory effects in the classical multi-item WM task of encoding and immediate free recall of word lists. Memory reactivation in the model occurs in discrete oscillatory bursts rather than as sustained activity. We relate dynamic network activity as well as key synaptic characteristics to electrophysiological measurements. Our findings support the hypothesis that fast Hebbian short-term potentiation is a key WM mechanism.

  • 26. Forreryd, Andy
    et al.
    Norinder, Ulf
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institutet, Sweden.
    Lindberg, Tim
    Lindstedt, Malin
    Predicting skin sensitizers with confidence - Using conformal prediction to determine applicability domain of GARD2018In: Toxicology in Vitro, ISSN 0887-2333, E-ISSN 1879-3177, Vol. 48, p. 179-187Article in journal (Refereed)
    Abstract [en]

    GARD - Genomic Allergen Rapid Detection is a cell based alternative to animal testing for identification of skin sensitizers. The assay is based on a biomarker signature comprising 200 genes measured in an in vitro model of dendritic cells following chemical stimulations, and consistently reports predictive performances similar to 90% for classification of external test sets. Within the field of in vitro skin sensitization testing, definition of applicability domain is often neglected by test developers, and assays are often considered applicable across the entire chemical space. This study complements previous assessments of model performance with an estimate of confidence in individual classifications, as well as a statistically valid determination of the applicability domain for the GARD assay. Conformal prediction was implemented into current GARD protocols, and a large external test dataset (n = 70) was classified at a confidence level of 85%, to generate a valid model with a balanced accuracy of 88%, with none of the tested chemical reactivity domains identified as outside the applicability domain of the assay. In conclusion, results presented in this study complement previously reported predictive performances of GARD with a statistically valid assessment of uncertainty in each individual prediction, thus allowing for classification of skin sensitizers with confidence.

  • 27. Forte, Castela
    et al.
    Voinea, Andrei
    Chichirau, Malina
    Yeshmagambetova, Galiya
    Albrecht, Lea M.
    Erfurt, Chiara
    Freundt, Liliane A.
    Oliveira e Carmo, Luisa
    Henning, Robert H.
    van der Horst, Iwan C. C.
    Sundelin, Tina
    Stockholm University, Faculty of Social Sciences, Department of Psychology, Stress Research Institute. Karolinska Institutet, Sweden.
    Wiering, Marco A.
    Axelsson, John
    Stockholm University, Faculty of Social Sciences, Department of Psychology, Stress Research Institute. Karolinska Institutet, Sweden.
    Epema, Anne H.
    Deep Learning for Identification of Acute Illness and Facial Cues of Illness2021In: Frontiers in Medicine, E-ISSN 2296-858X, Vol. 8, article id 661309Article in journal (Refereed)
    Abstract [en]

    Background: The inclusion of facial and bodily cues (clinical gestalt) in machine learning (ML) models improves the assessment of patients' health status, as shown in genetic syndromes and acute coronary syndrome. It is unknown if the inclusion of clinical gestalt improves ML-based classification of acutely ill patients. As in previous research in ML analysis of medical images, simulated or augmented data may be used to assess the usability of clinical gestalt.

    Objective: To assess whether a deep learning algorithm trained on a dataset of simulated and augmented facial photographs reflecting acutely ill patients can distinguish between healthy and LPS-infused, acutely ill individuals.

    Methods: Photographs from twenty-six volunteers whose facial features were manipulated to resemble a state of acute illness were used to extract features of illness and generate a synthetic dataset of acutely ill photographs, using a neural transfer convolutional neural network (NT-CNN) for data augmentation. Then, four distinct CNNs were trained on different parts of the facial photographs and concatenated into one final, stacked CNN which classified individuals as healthy or acutely ill. Finally, the stacked CNN was validated in an external dataset of volunteers injected with lipopolysaccharide (LPS).

    Results: In the external validation set, the four individual feature models distinguished acutely ill patients with sensitivities ranging from 10.5% (95% CI, 1.3-33.1% for the skin model) to 89.4% (66.9-98.7%, for the nose model). Specificity ranged from 42.1% (20.3-66.5%) for the nose model and 94.7% (73.9-99.9%) for skin. The stacked model combining all four facial features achieved an area under the receiver characteristic operating curve (AUROC) of 0.67 (0.62-0.71) and distinguished acutely ill patients with a sensitivity of 100% (82.35-100.00%) and specificity of 42.11% (20.25-66.50%).

    Conclusion: A deep learning algorithm trained on a synthetic, augmented dataset of facial photographs distinguished between healthy and simulated acutely ill individuals, demonstrating that synthetically generated data can be used to develop algorithms for health conditions in which large datasets are difficult to obtain. These results support the potential of facial feature analysis algorithms to support the diagnosis of acute illness.

  • 28.
    Friedrich, Stefanie
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Computational Analysis of Tumour Heterogeneity2020Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Every tumour is unique and characterised by its genetic, epigenetic, phenotypic, and morphological signature. The diversity observed between and within tumours, and over time, is termed tumour heterogeneity. An increased heterogeneity within a tumour correlates with cancer progression, higher resistance rates, and poorer outcome. Heterogeneity between tumours explains aspects of a treatment’s ineffectiveness. Depending on a tumour’s unique signature, common processes like unhindered cell proliferation, invasiveness, or treatment resistance characterise tumour progression. Studying tumour heterogeneity aims to understand cancer causes and evolution, and eventually to improve cancer treatment outcomes. 

    This thesis presents application and development of computational methods to study tumour heterogeneity. Papers I and II concern the in-depth investigation of clinical tissue samples taken from prostate cancer patients. The findings range from spatial expansion of gene expression patterns based on high-resolution data to a gene expression signature of non-responding cancer cells revealed by spatio-temporal analysis. These cells underwent a transition from an epithelial to a mesenchymal phenotype pre-treatment. Papers III and IV present tools to detect fusion transcripts and copy number variations, respectively. Both tools, applicable to high-resolution data, enable the in-depth study of mutations, which are the driving force behind tumour heterogeneity.

    The results in this thesis demonstrate how the beneficial combination of high-resolution data and computational methods leads to novel insights of tumour heterogeneity. 

    Download full text (pdf)
    Computational Analysis of Tumour Heterogeneity
    Download (jpg)
    Omslagsframsida
  • 29.
    Friedrich, Stefanie
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Fusion transcript detection using spatial transcriptomicsManuscript (preprint) (Other academic)
    Abstract [en]

    Fusion transcripts are involved in tumourigenesis and play a crucial role in tumour heterogeneity, tumour evolution and cancer treatment resistance. However, fusion transcripts have not been studied at high spatial resolution in tissue sections due to the lack of full-length transcripts with spatial information. New high-throughput technologies like spatial transcriptomics measure the transcriptome of tissue sections on almost single-cell level. While this technique does not allow for direct detection of fusion transcripts, we show that they can be inferred using the relative poly(A) tail abundance of the involved parental genes.

    We present a new method STfusion, which uses spatial transcriptomics to infer the presence and absence of poly(A) tails. A fusion transcript lacks a poly(A) tail for the 5´ gene and has an elevated number of poly(A) tails for the 3´ gene. Its expression level is defined by the upstream promoter of the 5´ gene. STfusion measures the difference between the observed and expected number of poly(A) tails with a novel C-score. 

    We verified the STfusion ability to predict fusion transcripts on HeLa cells with known fusions. STfusion and C-sore applied to clinical prostate cancer data revealed the spatial distribution of the cis-SAGe SLC45A3-ELK4 in 12 tissue sections with almost single-cell resolution. The cis-SAGe occured in the centre or periphery of inflamed, prostatic intraepithelial neoplastic, or cancerous areas, and occasionally in normal glands.

  • 30.
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Network and gene expression analyses for understanding protein function2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Biological function is the result of a complex network of functional associations between genes or their products. Modeling the dynamics underlying biological networks is one of the big challenges in bioinformatics. A first step towards solving this problem is to predict and study the networks of functional associations underlying various conditions.

    An improved version of the FunCoup network inference method that features networks for three new species and updated versions of the existing networks is presented. Network clustering, i.e. partitioning networks into highly connected components is an important tool for network analysis. We developed MGclus, a clustering method for biological networks that scores shared network neighbors. We found MGclus to perform favorably compared to other methods popular in the field. Studying sets of experimentally derived genes in the context of biological networks is a common strategy to shed light on their underlying biology. The CrossTalkZ method presented in this work assesses the statistical significance of crosstalk enrichment, i.e. the extent of connectivity between or within groups of functionally coupled genes or proteins in biological networks. We further demonstrate that CrossTalkZ is a valuable method to functionally annotate experimentally derived gene sets.

    Males and females differ in the expression of an extensive number of genes. The methods developed in the first part of this work were applied to study sex-biased genes in chicken and several network properties related to the molecular mechanisms of sex-biased gene regulation in chicken were deduced. Cancer studies have shown that tumor progression is strongly determined by the tumor microenvironment. We derived a gene expression signature of PDGF-activated fibroblasts that shows a strong prognostic significance in breast cancer in univariate and multivariate survival analyses when compared to established markers for prognosis.

  • 31.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Sunny
    Bergh, Jonas
    Östman, Arne
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Prognostic significance in breast cancer of a gene signature capturing stromal PDGF signalingIn: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191Article in journal (Refereed)
  • 32.
    Govindarajan, Sudha
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.
    Bassot, Claudio
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.
    Lamb, John
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.
    Shu, Nanjiang
    Huang, Yan
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Science for Life Laboratory, Sweden.
    The evolutionary history of topological variations in the CPA/AT superfamilyManuscript (preprint) (Other academic)
    Abstract [en]

    CPA/AT transporters consist of two structurally and evolutionarily related inverted repeat units, each of them with one core and one scaffold subdomain. During evolution, these families have undergone substantial changes in structure, topology and function. Central to the function of the transporters is the existence of two noncanonical helices that are involved in the transport process. In different families, two different types of these helices have been identified, reentrant and broken. Here, we use an integrated topology annotation method to identify novel topologies in the families. It combines topology prediction, similarity to families with known structure, and the difference in positively charged residues present in inside and outside loops in alternative topological models. We identified families with diverse topologies containing broken or reentrant helix. We classified all families based on 3 distinct evolutionary groups that each share a structurally similar C-terminal repeat unit newly termed as “Fold-types”. Using the evolutionary relationship between families we propose topological transitions including, a transition between broken and reentrant helices, complete change of orientation, changes in the number of scaffold helices and even in some rare cases, losses of core helices. The evolutionary history of the repeat units shows gene duplication and repeat shuffling events to result in these extensive topology variations. The novel structure-based classification, together with supporting structural models and other information, is presented in a searchable database, CPAfold (cpafold.bioinfo.se). Our comprehensive study of topology variations within the CPA superfamily provides better insight about their structure and evolution.

  • 33.
    Granholm, Viktor
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    The accuracy of statistical confidence estimates in shotgun proteomics2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    High-throughput techniques are currently some of the most promising methods to study molecular biology, with the potential to improve medicine and enable new biological applications. In proteomics, the large scale study of proteins, the leading method is mass spectrometry. At present researchers can routinely identify and quantify thousands of proteins in a single experiment with the technique called shotgun proteomics.

    A challenge of these experiments is the computational analysis and the interpretation of the mass spectra. A shotgun proteomics experiment easily generates tens of thousands of spectra, each thought to represent a peptide from a protein. Due to the immense biological and technical complexity, however, our computational tools often misinterpret these spectra and derive incorrect peptides. As a consequence, the biological interpretation of the experiment relies heavily on the statistical confidence that we estimate for the identifications.

    In this thesis, I have included four articles from my research on the accuracy of the statistical confidence estimates in shotgun proteomics, how to accomplish and evaluate it. In the first two papers a new method to use pre-characterized protein samples to evaluate this accuracy is presented. The third paper deals with how to avoid statistical inaccuracies when using machine learning techniques to analyze the data. In the fourth paper, we present a new tool for analyzing shotgun proteomics results, and evaluate the accuracy of  its statistical estimates using the method from the first papers.

    The work I have included here can facilitate the development of new and accurate computational tools in mass spectrometry-based proteomics. Such tools will help making the interpretation of the spectra and the downstream biological conclusions more reliable.

    Download full text (pdf)
    thesis.pdf
  • 34.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kim, Sangtae
    Navarro, José C. F.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Smith, Richard D.
    Käll, Lukas
    Fast and Accurate Database Searches with MS-GF plus Percolator:  2014In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 13, no 2, p. 890-897Article in journal (Refereed)
    Abstract [en]

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  • 35.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Navarro, Jose Fernandez
    Noble, William Stafford
    Käll, Lukas
    Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics2013In: Journal of Proteomics, ISSN 1874-3919, E-ISSN 1876-7737, Vol. 80, p. 123-131Article in journal (Refereed)
    Abstract [en]

    The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.

  • 36.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Noble, William Stafford
    Käll, Lukas
    A cross-validation scheme for machine learning algorithms in shotgun proteomics2012In: BMC Bioinformatics, E-ISSN 1471-2105, Vol. 13, p. S3-Article in journal (Refereed)
    Abstract [en]

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • 37.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Noble, William Stafford
    Käll, Lukas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    On Using Samples of Known Protein Content to Assess the Statistical Calibration of Scores Assigned to Peptide-Spectrum Matches in Shotgun Proteomics2011In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 10, no 5, p. 2671-2678Article in journal (Refereed)
    Abstract [en]

    In shotgun proteomics, the quality of a hypothesized match between an observed spectrum and a peptide sequence is quantified by a score function. Because the score function lies at the heart of any peptide identification pipeline, this function greatly affects the final results of a proteomics assay. Consequently, valid statistical methods for assessing the quality of a given score function are extremely important. Previously, several research groups have used samples of known protein composition to assess the quality of a given score function. We demonstrate that this approach is problematic, because the outcome can depend on factors other than the score function itself. We then propose an alternative use of the same type of data to validate a score function. The central idea of our approach is that database matches that are not explained by any protein in the purified sample comprise a robust representation of incorrect matches. We apply our alternative assessment scheme to several commonly used score functions, and we show that our approach generates a reproducible measure of the calibration of a given peptide identification method. Furthermore, we show how our quality test can be useful in the development of novel score functions.

  • 38.
    Hao, Chengcheng
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Statistics.
    von Rosen, Dietrich
    von Rosen, Tatjana
    Stockholm University, Faculty of Social Sciences, Department of Statistics.
    Influence diagnostics for count data under AB-BA crossover trials2017In: Statistical Methods in Medical Research, ISSN 0962-2802, E-ISSN 1477-0334, Vol. 26, no 6, p. 2938-2950Article in journal (Refereed)
    Abstract [en]

    This paper aims to develop diagnostic measures to assess the influence of data perturbations on estimates in AB-BA crossover studies with a Poisson distributed response. Generalised mixed linear models with normally distributed random effects are utilised. We show that in this special case, the model can be decomposed into two independent sub-models which allow to derive closed-form expressions to evaluate the changes in the maximum likelihood estimates under several perturbation schemes. The performance of the new influence measures is illustrated by simulation studies and the analysis of a real dataset.

  • 39. Hayat, Sikander
    et al.
    Peters, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tsirigos, Konstantinos D.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Inclusion of dyad-repeat pattern improves topology prediction of transmembrane beta-barrel proteins2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 10, p. 1571-1573Article in journal (Refereed)
    Abstract [en]

    Accurate topology prediction of transmembrane beta-barrels is still an open question. Here, we present BOCTOPUS2, an improved topology prediction method for transmembrane beta-barrels that can also identify the barrel domain, predict the topology and identify the orientation of residues in transmembrane beta-strands. The major novelty of BOCTOPUS2 is the use of the dyad-repeat pattern of lipid and pore facing residues observed in transmembrane beta-barrels. In a cross-validation test on a benchmark set of 42 proteins, BOCTOPUS2 predicts the correct topology in 69% of the proteins, an improvement of more than 10% over the best earlier method (BOCTOPUS) and in addition, it produces significantly fewer erroneous predictions on non-transmembrane beta-barrel proteins.

  • 40. Hee, Siew Wan
    et al.
    Hamborg, Thomas
    Day, Simon
    Madan, Jason
    Miller, Frank
    Stockholm University, Faculty of Social Sciences, Department of Statistics.
    Posch, Martin
    Zohar, Sarah
    Stallard, Nigel
    Decision-theoretic designs for small trials and pilot studies: A review2016In: Statistical Methods in Medical Research, ISSN 0962-2802, E-ISSN 1477-0334, Vol. 25, no 3, p. 1022-1038Article, review/survey (Refereed)
    Abstract [en]

    Pilot studies and other small clinical trials are often conducted but serve a variety of purposes and there is little consensus on their design. One paradigm that has been suggested for the design of such studies is Bayesian decision theory. In this article, we review the literature with the aim of summarizing current methodological developments in this area. We find that decision-theoretic methods have been applied to the design of small clinical trials in a number of areas. We divide our discussion of published methods into those for trials conducted in a single stage, those for multi-stage trials in which decisions are made through the course of the trial at a number of interim analyses, and those that attempt to design a series of clinical trials or a drug development programme. In all three cases, a number of methods have been proposed, depending on the decision maker’s perspective being considered and the details of utility functions that are used to construct the optimal design.

  • 41.
    Hellmuth, Marc
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics.
    Michel, Mira
    Nøjgaard, Nikolai N.
    Schaller, David
    Stadler, Peter F.
    Combining Orthology and Xenology Data in a Common Phylogenetic Tree2021In: Advances in Bioinformatics and Computational Biology: 14th Brazilian Symposium on Bioinformatics, BSB 2021, Virtual Event, November 22–26, 2021, Proceedings / [ed] Peter F. Stadler; Maria Emilia M. T. Walter; Maribel Hernandez-Rosales; Marcelo M. Brigido, Cham: Springer, 2021, p. 53-64Conference paper (Refereed)
    Abstract [en]

    In mathematical phylogenetics, types of events in a gene tree T are formalized by vertex labels t(v) and set-valued edge labels λ(e). The orthology and paralogy relations between genes are a special case of a map δ on the pairs of leaves of T defined by δ(x,y)=q if the last common ancestor lca(x,y) of x and y is labeled by an event type q, e.g., speciation or duplication. Similarly, a map εε with m∈ε(x,y) if m∈λ(e) for at least one edge e along the path from lca(x,y) to y generalizes xenology, i.e., horizontal gene transfer. We show that a pair of maps (δ,ε) derives from a tree (T,t,λ) in this manner if and only if there exists a common refinement of the (unique) least-resolved vertex labeled tree (Tδ,tδ) that explains δ and the (unique) least-resolved edge labeled tree (Tεε) that explains ε (provided both trees exist). This result remains true if certain combinations of labels at incident vertices and edges are forbidden.

  • 42.
    Hellmuth, Marc
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics.
    Schaller, David
    Stadler, Peter F.
    Clustering systems of phylogenetic networks2023In: Theory in biosciences, ISSN 1431-7613, E-ISSN 1611-7530, no 142, p. 301-358Article in journal (Refereed)
    Abstract [en]

    Rooted acyclic graphs appear naturally when the phylogenetic relationship of a set X of taxa involves not only speciations but also recombination, horizontal transfer, or hybridization that cannot be captured by trees. A variety of classes of such networks have been discussed in the literature, including phylogenetic, level-1, tree-child, tree-based, galled tree, regular, or normal networks as models of different types of evolutionary processes. Clusters arise in models of phylogeny as the sets C(v) of descendant taxa of a vertex v. The clustering system CN comprising the clusters of a network N conveys key information on N itself. In the special case of rooted phylogenetic trees, T is uniquely determined by its clustering system CT. Although this is no longer true for networks in general, it is of interest to relate properties of N and CN. Here, we systematically investigate the relationships of several well-studied classes of networks and their clustering systems. The main results are correspondences of classes of networks and clustering systems of the following form: If N is a network of type X, then CN satisfies Y, and conversely if C is a clustering system satisfying Y, then there is network N of type X such that C⊆CN.This, in turn, allows us to investigate the mutual dependencies between the distinct types of networks in much detail.

  • 43.
    Hellmuth, Marc
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics.
    Stadler, Peter F.
    Thekkumpadan Puthiyaveedu, Sandhya
    Fitch Graph Completion2023In: Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349, p. 225-237Article in journal (Refereed)
    Download full text (pdf)
    fulltext
  • 44.
    Hildebrandt, Franziska
    Stockholm University, Faculty of Science, Department of Molecular Biosciences, The Wenner-Gren Institute.
    Host-parasite interactions in space and time2023Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Unicellular parasites of the apicomplexan phylum have a considerable effect on global health and agriculture. Two prominent examples of this phylum include malaria causing parasites of the Plasmodium genus and the widely prevalent parasite Toxoplasma gondii. While sharing a common ancestor, these parasites occupy unique biological niches, follow distinct life cycles, and result in different courses and outcomes of disease. In response to the parasite, the mammalian host has developed efficient and effective defense strategies. However, both Plasmodium and Toxoplasma have evolved strategies to evade the host’s defense response. Plasmodium parasites infect distinct tissues and cell types whereas T. gondii parasites are highly promiscuous and infect all nucleated cells. The identification of key factors involved in the interaction between the host and parasite is crucial for disease intervention, prevention, and eventually eradication efforts.

    Next-generation sequencing technologies have proven effective tools to investigate the response in a tissue or cell population of an infected organism. Novel genomics methods such as single-cell RNA-seq and spatial transcriptomics have enabled the investigation of heterogeneous transcriptional responses of individual cells in a population as well as heterogeneous expression profiles at spatially distinct tissue positions across entire tissue sections. This thesis pioneers the exploration of these methods in discerning the enormous complexity underlying host-parasite interplay.

    In Paper I, we determine spatial components of naive mouse liver in its true tissue context. We define gene expression gradients of pericentral and periportal zones in the liver and predict vein types with ambiguous annotations, based on in situ transcriptional profiles. We further identify novel spatial structures with distinct transcriptional profiles, associated with tissue integrity and integrate cell type proportions across the tissue.

    In Paper II we investigate host-pathogen interactions in P. berghei infected liver sections with spatiotemporal resolution. We establish spatial gene expression gradients from infection sites exhibiting upregulation of lipid metabolism associated genes 38 hours post-infection, suggesting a potential role of these pathways in immune evasion. We further show that local and systemic inflammation are delayed but not ablated in salivary gland lysate challenged control livers and propose that local inflammatory hotspots may represent an important spatial component for parasite development in the liver.

    In Paper III we use dual scRNA-seq to investigate heterogeneous transcription of mouse bone marrow-derived dendritic cells (BMDCs) infected with two distinct genotypes of T. gondii parasites. We show differential responses towards the two T. gondii genotypes in two distinct subpopulations of BMDCs over multiple time points post infection. Moreover, we generate co-expression networks that define host and parasite genes, which are likely involved in the modulation of host immunity.

    In summary, this thesis aims to characterize host-pathogen interactions of two major apicomplexan genera in two distinct cell niches of the murine host with spatiotemporal or single cell resolution. In detail, this encompasses the study of spatial structures of the host in the liver environment and the spatiotemporal consequences of an infection with P. berghei. Furthermore, the aims include deciphering heterogeneous interactions between two distinct T. gondii strains and infected BMDCs.

    Download full text (pdf)
    Host-parasite interactions in space and time
    Download (jpg)
    presentationsbild
  • 45.
    Hildebrandt, Franziska
    et al.
    Stockholm University.
    Urrutia-Iturritza, Miren
    Stockholm University.
    Zwicker, Christian
    Vanneste, Bavo
    Van Hul, Noémi
    Semle, Elisa
    Stockholm University.
    Pascini, Tales
    Saarenpää, Sami
    He, Mengxiao
    Andersson, Emma R.
    Scott, Charlotte L.
    Vega-Rodriguez, Joel
    Lundeberg, Joakim
    Ankarklev, Johan
    Stockholm University.
    Spatiotemporal analysis of the malaria-infected liver indicates a crucial role for lipid metabolism and hotspots of inflammatory cell infiltrationManuscript (preprint) (Other academic)
  • 46.
    Hosseini Ashtiani, Saman
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Emami Khoonsari, Payam
    Carlsson, Henrik
    Herman, Stephanie
    Freyhult, Eva
    Mallmin, Hans
    Kultima, Kim
    P Hailer, Nils
    Studying the metabolomics-level effects of Denosumab after Uncemented Total Hip Arthroplasty: Based on a Randomized Placebo- Controlled Clinical TrialManuscript (preprint) (Other academic)
    Abstract [en]

    Purpose: To evaluate if metabolomic methods using high-resolution mass spectrometry (HRMS) can increase our understanding of the rebound effect with rapid loss of bone-mineral-density (BMD) seen after discontinuation of denosumab treatment after cementless total hip arthroplasty (THA) in patients with osteoarthritis of the hip.

    Methods: Sixty-four patients operated with cementless THA were randomized to two doses of 60-mg denosumab or placebo 1-3 days and six months postoperatively. Serum samples were analyzed using untargeted HRMS coupled to liquid chromatography (LC). Bone turnover markers were assessed. Data were analyzed using linear mixed effect models and machine learning.

    Results: Global metabolic differences were found after surgery, affecting denosumab and placebo treated patients differently. Eighty-three features displayed significant (p<0.0001) changes in concentrations after surgery, including a significant decrease in the dipeptides Di-L-phenylalanine, Phenylalanylleucine and Alpha-Asp-Phe in the placebo group. However, twenty-four months after surgery, these concentrations were significantly higher in denosumab treated patients compared to placebo. Further, fibrinopeptide A and related peptides were increased in concentration in placebo compared to denosumab, starting six months after surgery. In the denosumab group, concentrations of bone turnover markers (P1NP/CTX) were substantially reduced after three months, remained suppressed after six and twelve months, but increased above baseline and placebo 24 months after surgery. The peptides AP(Ox)GDRGEP(Ox)GPP(Ox)GP, derived from the protein Collagen type I alpha 1 chain (COL1A1) were tightly correlated to P1NP (P=4.4*10-83) and the tripeptide DL-alpha-aspartyl- DL-valyl-DL-proline (DVP) was tightly correlated with CTX (P=1.1*10-222).

    Conclusion: Global metabolic differences were found after surgery, affecting denosumab and placebo treated patients differently. Significantly increased levels of certain dipeptides may be of importance for the rebound effect with rapid loss of BMD seen after discontinuation of denosumab treatment. Fibrinopeptide A and related peptides may serve a protective role. The peptides AP(Ox)GDRGEP(Ox)GPP(Ox)GP and DVP represent novel markers for bone turnover that can be easily measured using LC-HRMS.

  • 47.
    Höhna, Sebastian
    Stockholm University, Faculty of Science, Department of Mathematics.
    Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 29, no 11, p. 1367-1374Article in journal (Refereed)
    Abstract [en]

    Motivation: Diversification rates and patterns may be inferred from reconstructed phylogenies. Both the time-dependent and the diversity-dependent birthdeath process can produce the same observed patterns of diversity over time. To develop and test new models describing the macro-evolutionary process of diversification, generic and fast algorithms to simulate under these models are necessary. Simulations are not only important for testing and developing models but play an influential role in the assessment of model fit.

    Results: In the present article, I consider as the model a global time-dependent birthdeath process where each species has the same rates but rates may vary over time. For this model, I derive the likelihood of the speciation times from a reconstructed phylogenetic tree and show that each speciation event is independent and identically distributed. This fact can be used to simulate efficiently reconstructed phylogenetic trees when conditioning on the number of species, the time of the process or both. I show the usability of the simulation by approximating the posterior predictive distribution of a birthdeath process with decreasing diversification rates applied on a published bird phylogeny (family Cettiidae).

    Availability: The methods described in this manuscript are implemented in the R package TESS, available from the repository CRAN (http://cran.r-project.org/web/packages/TESS/).

  • 48. Jansson, Jesper
    et al.
    Mampentzidis, Konstantinos
    Thekkumpadan Puthiyaveedu, Sandhya
    Stockholm University, Faculty of Science, Department of Mathematics. The Hong Kong Polytechnic University, Hong Kong.
    Building a small and informative phylogenetic supertree2023In: Information and Computation, ISSN 0890-5401, E-ISSN 1090-2651, Vol. 294, article id 105082Article in journal (Refereed)
    Abstract [en]

    We combine two fundamental optimization problems related to the construction of phylogenetic trees called maximum rooted triplets consistency and minimally resolved supertree into a new problem, which we call q-maximum rooted triplets consistency (q-MAXRTC). It takes as input a set R of rooted, binary phylogenetic trees with three leaves each and asks for a phylogenetic tree with exactly q internal nodes that contains the largest possible number of trees from R. We prove that q-MAXRTC is NP-hard to approximate within a constant, develop polynomial-time approximation algorithms for different values of q, and show experimentally that representing a phylogenetic tree by one having much fewer nodes typically does not destroy too much branching information. To demonstrate the algorithmic advantage of using trees with few internal nodes, we also propose a new algorithm for computing the rooted triplet distance that is faster than the existing algorithms when restricted to such trees.

  • 49. Jonasdottir, Gudrun
    et al.
    Humphreys, Keith
    Palmgren, Juni
    Stockholm University, Faculty of Science, Department of Mathematics.
    Testing association in the presence of linkage2007In: Genetic Epidemiology, Vol. 31, p. 528-540Article in journal (Refereed)
  • 50.
    Kaduk, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Functional Inference from Orthology and Domain Architecture2018Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Proteins are the basic building blocks of all living organisms. They play a central role in determining the structure of living beings and are required for essential chemical reactions. One of the main challenges in bioinformatics is to characterize the function of all proteins. The problem of understanding protein function can be approached by understanding their evolutionary history. Orthology analysis plays an important role in studying the evolutionary relation of proteins. Proteins are termed orthologs if they derive from a single gene in the species' last common ancestor, i.e. if they were separated by a speciation event. Orthologs are useful because they retain their function more often than other homologs. 

    Inference of a complete set of orthologs for many species is computationally intensive. Currently, the fastest algorithms rely on graph-based approaches, which compare all-vs-all sequences and then cluster top hits into groups of orthologs. The initial step of performing all-vs-all comparisons is usually the primary computational challenge as it scales quadratically with the number of species. 

    A new, more scalable and less computationally demanding method was developed to solve this problem without sacrificing accuracy. The Hieranoid 2 algorithm reduces computational complexity to almost linear by overcoming the necessity to perform all-vs-all similarity searches. The algorithm progresses along a known species tree, from leaves to root. Starting at the leaves, ortholog groups are predicted conventionally and then summarized at internal nodes to form pseudo-species. These pseudo-species are then re-used to search against other (pseudo-)species higher in the tree. This way the algorithm aggregates new ortholog groups hierarchically. The hierarchy is a natural structure to store and view large multi-species ortholog groups, and provides a complete picture of inferred evolutionary events. 

    To facilitate explorative analysis of hierarchical groups of orthologs, a new online tool was created. The HieranoiDB website provides precomputed hierarchical groups of orthologs for a set of 66 species. It allows the user to search for orthology assignments using protein description, protein sequence, or species. Evolutionary events and meta information is added to the hierarchical groups of orthologs, which are shown graphically as interactive trees. This representation allows exploring, searching, and easier visual inspection of multi-species ortholog groups.

    The majority of orthology prediction methods focus on treating the whole protein sequence as a single evolutionary unit. However, proteins are often composed of individual units, called protein domains, that can have different evolutionary histories. To extend the full sequence based methodology to a domain-aware method, a new approach called Domainoid is proposed. Here, domains are extracted from full-length sequences and subjected to orthology inference. This allows Domainoid to find orthology that would be missed by a full sequence approach.

    Networks are a convenient graphical representation for showing a large number of functional associations between genes or proteins. They allow various analyses of graph properties, and can help visualize complex relationships. A framework for inferring comprehensive functional association networks was developed, called FunCoup. A major difference compared to other networks is FunCoup's extensive use of orthology relationships between species, which significantly boosts its coverage. Using naïve Bayesian classifiers to integrate 10 different evidence types and orthology transfer, FunCoup captures functional associations of many types, and provides comprehensive networks for 17 species across five gold-standards.

    Download full text (pdf)
    Functional Inference from Orthology and Domain Architecture
    Download (jpg)
    Omslagsframsida
123 1 - 50 of 127
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf