Change search
Refine search result
12 1 - 50 of 63
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1. Ali, Raja Hashim
    et al.
    Muhammad, Sayyed Auwn
    Khan, Mehmood Alam
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab). Swedish e-Science Research Center, Sweden .
    Quantitative synteny scoring improves homology inference and partitioning of gene families2013In: BMC Bioinformatics, ISSN 1471-2105, Vol. 14, no Suppl,15, S12- p.Article in journal (Refereed)
    Abstract [en]

    Background

    Clustering sequences into families has long been an important step in characterization of genes and proteins. There are many algorithms developed for this purpose, most of which are based on either direct similarity between gene pairs or some sort of network structure, where weights on edges of constructed graphs are based on similarity. However, conserved synteny is an important signal that can help distinguish homology and it has not been utilized to its fullest potential.

    Results

    Here, we present GenFamClust, a pipeline that combines the network properties of sequence similarity and synteny to assess homology relationship and merge known homologs into groups of gene families. GenFamClust identifies homologs in a more informed and accurate manner as compared to similarity based approaches. We tested our method against the Neighborhood Correlation method on two diverse datasets consisting of fully sequenced genomes of eukaryotes and synthetic data.

    Conclusions

    The results obtained from both datasets confirm that synteny helps determine homology and GenFamClust improves on Neighborhood Correlation method. The accuracy as well as the definition of synteny scores is the most valuable contribution of GenFamClust.

  • 2.
    Basile, Walter
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sachenkova, Oxana
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Light, Sara
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Linköping University, Sweden.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab). Kungliga Tekniska Högskolan, Sweden.
    High GC content causes orphan proteins to be intrinsically disordered2017In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 13, no 3, e1005375Article in journal (Refereed)
    Abstract [en]

    De novo creation of protein coding genes involves the formation of short ORFs from noncoding regions; some of these ORFs might then become fixed in the population These orphan proteins need to, at the bare minimum, not cause serious harm to the organism, meaning that they should for instance not aggregate. Therefore, although the creation of short ORFs could be truly random, the fixation should be subjected to some selective pressure. The selective forces acting on orphan proteins have been elusive, and contradictory results have been reported. In Drosophila young proteins are more disordered than ancient ones, while the opposite trend is present in yeast. To the best of our knowledge no valid explanation for this difference has been proposed. To solve this riddle we studied structural properties and age of proteins in 187 eukaryotic organisms. We find that, with the exception of length, there are only small differences in the properties between proteins of different ages. However, when we take the GC content into account we noted that it could explain the opposite trends observed for orphans in yeast (low GC) and Drosophila (high GC). GC content is correlated with codons coding for disorder promoting amino acids. This leads us to propose that intrinsic disorder is not a strong determining factor for fixation of orphan proteins. Instead these proteins largely resemble random proteins given a particular GC level. During evolution the properties of a protein change faster than the GC level causing the relationship between disorder and GC to gradually weaken.

  • 3.
    Bernsel, Andreas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Sequence-based predictions of membrane-protein topology, homology and insertion2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Membrane proteins comprise around 20-30% of a typical proteome and play crucial roles in a wide variety of biochemical pathways. Apart from their general biological significance, membrane proteins are of particular interest to the pharmaceutical industry, being targets for more than half of all available drugs. This thesis focuses on prediction methods for membrane proteins that ultimately rely on their amino acid sequence only.

    By identifying soluble protein domains in membrane protein sequences, we were able to constrain and improve prediction of membrane protein topology, i.e. what parts of the sequence span the membrane and what parts are located on the cytoplasmic and extra-cytoplasmic sides. Using predicted topology as input to a profile-profile based alignment protocol, we managed to increase sensitivity to detect distant membrane protein homologs.

    Finally, experimental measurements of the level of membrane integration of systematically designed transmembrane helices in vitro were used to derive a scale of position-specific contributions to helix insertion efficiency for all 20 naturally occurring amino acids. Notably, position within the helix was found to be an important factor for the contribution to helix insertion efficiency for polar and charged amino acids, reflecting the highly anisotropic environment of the membrane. Using the scale to predict natural transmembrane helices in protein sequences revealed that, whereas helices in single-spanning proteins are typically hydrophobic enough to insert by themselves, a large part of the helices in multi-spanning proteins seem to require stabilizing helix-helix interactions for proper membrane integration. Implementing the scale to predict full transmembrane topologies yielded results comparable to the best statistics-based topology prediction methods.

  • 4.
    Berthet, Pierre
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Computational Modeling of the Basal Ganglia: Functional Pathways and Reinforcement Learning2015Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    We perceive the environment via sensor arrays and interact with it through motor outputs. The work of this thesis concerns how the brain selects actions given the information about the perceived state of the world and how it learns and adapts these selections to changes in this environment. Reinforcement learning theories suggest that an action will be more or less likely to be selected if the outcome has been better or worse than expected. A group of subcortical structures, the basal ganglia (BG), is critically involved in both the selection and the reward prediction.

    We developed and investigated a computational model of the BG. We implemented a Bayesian-Hebbian learning rule, which computes the weights between two units based on the probability of their activations. We were able test how various configurations of the represented pathways impacted the performance in several reinforcement learning and conditioning tasks. Then, following the development of a more biologically plausible version with spiking neurons, we simulated lesions in the different pathways and assessed how they affected learning and selection.

    We observed that the evolution of the weights and the performance of the models resembled qualitatively experimental data. The absence of an unique best way to configure the model over all the learning paradigms tested indicates that an agent could dynamically configure its action selection mode, mainly by including or not the reward prediction values in the selection process. We present hypotheses on possible biological substrates for the reward prediction pathway. We base these on the functional requirements for successful learning and on an analysis of the experimental data. We further simulate a loss of dopaminergic neurons similar to that reported in Parkinson’s disease. We suggest that the associated motor symptoms are mostly causedby an impairment of the pathway promoting actions, while the pathway suppressing them seems to remain functional.

  • 5.
    Berthet, Pierre
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Lindahl, Mikael
    Tully, Philip
    Hellgren-Kotaleski, Jeanette
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Functional relevance of different basal ganglia pathways investigated in a spiking model with reward dependent plasticityManuscript (preprint) (Other academic)
  • 6.
    Bjelkmar, Pär
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Modeling of voltage-gated ion channels2011Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The recent determination of several crystal structures of voltage-gated ion channels has catalyzed computational efforts of studying these remarkable molecular machines that are able to conduct ions across biological membranes at extremely high rates without compromising the ion selectivity.

    Starting from the open crystal structures, we have studied the gating mechanism of these channels by molecular modeling techniques. Firstly, by applying a membrane potential, initial stages of the closing of the channel were captured, manifested in a secondary-structure change in the voltage-sensor. In a follow-up study, we found that the energetic cost of translocating this 310-helix conformation was significantly lower than in the original conformation. Thirdly, collaborators of ours identified new molecular constraints for different states along the gating pathway. We used those to build new protein models that were evaluated by simulations. All these results point to a gating mechanism where the S4 helix undergoes a secondary structure transformation during gating.

    These simulations also provide information about how the protein interacts with the surrounding membrane. In particular, we found that lipid molecules close to the protein diffuse together with it, forming a large dynamic lipid-protein cluster. This has important consequences for the understanding of protein-membrane interactions and for the theories of lateral diffusion of membrane proteins.

    Further, simulations of the simple ion channel antiamoebin were performed where different molecular models of the channel were evaluated by calculating ion conduction rates, which were compared to experimentally measured values. One of the models had a conductance consistent with the experimental data and was proposed to represent the biological active state of the channel.

    Finally, the underlying methods for simulating molecular systems were probed by implementing the CHARMM force field into the GROMACS simulation package. The implementation was verified and specific GROMACS-features were combined with CHARMM and evaluated on long timescales. The CHARMM interaction potential was found to sample relevant protein conformations indifferently of the model of solvent used.

  • 7.
    Björkholm, Patrik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Protein Interactions from the Molecular to the Domain Level2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The basic unit of life is the cell, from single-cell bacteria to the largest creatures on the planet. All cells have DNA, which contains the blueprint for proteins. This information is transported in the form of messenger RNA from the genome to ribosomes where proteins are produced. Proteins are the main functional constituents of the cell, they usually have one or several functions and are the main actors in almost all essential biological processes. Proteins are what make the cell alive. Proteins are found as solitary units or as part of large complexes. Proteins can be found in all parts of the cell, the most common place being the cytoplasm, a central space in all cells. They are also commonly found integrated into or attached to various membranes.

    Membranes define the cell architecture. Proteins integrated into the membrane have a wide number of responsibilities: they are the gatekeepers of the cell, they secrete cellular waste products, and many of them are receptors and enzymes.

    The main focus of this thesis is the study of protein interactions, from the molecular level up to the protein domain level.

    In paper I use reoccurring local protein structures to try and predict what sections of a protein interacts with another part using only sequence information. In papers II and III we use a randomization approach on a membrane protein motif that we know interacts with a sphingomyelin lipid to find other candidate proteins that interact with sphingolipids. These are then experimentally verified as sphingolipid-binding. In the last paper, paper IV, we look at how protein domain interaction networks overlap and can be evaluated.

  • 8.
    Björkholm, Patrik
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Ernst, Andreas
    Hacke, Moritz
    Wieland, Felix
    Brügger, Britta
    von Heijne, Gunnar
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Identification of novel sphingolipid-binding motifs in mammalian membrane proteinsManuscript (preprint) (Other academic)
    Abstract [en]

    Specific interactions between transmembrane proteins and sphingolipids is a poorly understood phenomenon, and only a couple of instances have been identified. The best characterized example is the sphingolipid-binding motif VXXTLXXIY found in the transmembrane helix of the vesicular transport protein p24. Here, we have used a simple motif- probability algorithm (MOPRO) to identify proteins that contain putative sphingolipid-binding motifs in a dataset comprising full proteomes from mammalian organisms. Four selected candidate proteins all tested positive for sphingolipid binding in a photoaffinity assay. The putative sphingolipid-binding motifs are noticeably enriched in the 7TM family of G-protein coupled receptors, predominantly in transmembrane helix 6. 

  • 9.
    Caster, Ola
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Uppsala Monitoring Centre, Sweden.
    Norén, G. Niklas
    Stockholm University, Faculty of Science, Department of Mathematics. Uppsala Monitoring Centre, Sweden.
    Edwards, I. Ralph
    Computing limits on medicine risks based on collections of individual case reports2014In: Theoretical Biology Medical Modelling, ISSN 1742-4682, E-ISSN 1742-4682, Vol. 11, 15Article in journal (Refereed)
    Abstract [en]

    Background: Quantifying a medicine's risks for adverse effects is crucial in assessing its value as a therapeutic agent. Rare adverse effects are often not detected until after the medicine is marketed and used in large and heterogeneous patient populations, and risk quantification is even more difficult. While individual case reports of suspected harm from medicines are instrumental in the detection of previously unknown adverse effects, they are currently not used for risk quantification. The aim of this article is to demonstrate how and when limits on medicine risks can be computed from collections of individual case reports. Methods: We propose a model where drug exposures in the real world may be followed by adverse episodes, each containing one or several adverse effects. Any adverse episode can be reported at most once, and each report corresponds to a single adverse episode. Based on this model, we derive upper and lower limits for the per-exposure risk of an adverse effect for a given drug. Results: An upper limit for the per-exposure risk of the adverse effect Y for a given drug X is provided by the reporting ratio of X together with Y relative to all reports on X, under two assumptions: (i) the average number of adverse episodes following exposure to X is one or less; and (ii) adverse episodes that follow X and contain Y are more frequently reported than adverse episodes in general that follow X. Further, a lower risk limit is provided by dividing the number of reports on X together with Y by the total number of exposures to X, under the assumption that exposures to X that are followed by Y generate on average at most one report on X together with Y. Using real data, limits for the narcolepsy risk following Pandemrix vaccination and the risk of coeliac disease following antihypertensive treatment were computed and found to conform to reference risk values from epidemiological studies. Conclusions: Our framework enables quantification of medicine risks in situations where this is otherwise difficult or impossible. It has wide applicability, but should be particularly useful in structured benefit-risk assessments that include rare adverse effects.

  • 10.
    Caster, Ola
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Uppsala Monitoring Centre, WHO Collaborating Centre for International Drug Monitoring, Sweden.
    Sandberg, Lovisa
    Bergvall, Tomas
    Watson, Sarah
    Noren, G. Niklas
    vigiRank for statistical signal detection in pharmacovigilance: First results from prospective real-world use2017In: Pharmacoepidemiology and Drug Safety, ISSN 1053-8569, E-ISSN 1099-1557, Vol. 26, no 8, 1006-1010 p.Article in journal (Refereed)
    Abstract [en]

    Purpose: vigiRank is a data-driven predictive model for emerging safety signals. In addition to disproportionate reporting patterns, it also accounts for the completeness, recency, and geographic spread of individual case reporting, as well as the availability of case narratives. Previous retrospective analysis suggested that vigiRank performed better than disproportionality analysis alone. The purpose of the present analysis was to evaluate its prospective performance. Methods: The evaluation of vigiRank was based on real-world signal detection in VigiBase. In May 2014, vigiRank scores were computed for pairs of new drugs and WHO Adverse Reaction Terminology critical terms with at most 30 reports from at least 2 countries. Initial manual assessments were performed in order of descending score, selecting a subset of drug-adverse drug reaction pairs for in-depth expert assessment. The primary performance metric was the proportion of initial assessments that were decided signals during in-depth assessment. As comparator, the historical performance for disproportionality-guided signal detection in VigiBase was computed from a corresponding cohort of drug-adverse drug reaction pairs assessed between 2009 and 2013. During this period, the requirement for initial manual assessment was a positive lower endpoint of the 95% credibility interval of the Information Component measure of disproportionality, observed for the first time. Results: 194 initial assessments suggested by vigiRank's ordering eventually resulted in 6 (3.1%) signals. Disproportionality analysis yielded 19 signals from 1592 initial assessments (1.2%; P <.05). Conclusions: Combining multiple strength-of-evidence aspects as in vigiRank significantly outperformed disproportionality analysis alone in real-world pharmacovigilance signal detection, for VigiBase.

  • 11.
    Djurfeldt, Mikael
    et al.
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Lundqvist, Mikael
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Johansson, Christopher
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Rehn, Martin
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Ekeberg, Örjan
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Lansner, Anders
    Royal Institute of Technology, Computational Biology and Neurocomputing Group.
    Brain-scale simulation of the neocortex on the IBM Blue Gene/L  supercomputer2008In: IBM Journal of Research and Development, ISSN 0018-8646, Vol. 52, no 1-2, 31-41 p.Article in journal (Refereed)
    Abstract [en]

    Biologically detailed large-scale models of the brain can now be simulated thanks to increasingly powerful massively parallel supercomputers. We present an overview, for the general technical reader, of a neuronal network model of layers II/III of the neocortex built with biophysical model neurons. These simulations, carried out on an IBM Blue Gene/Le supercomputer, comprise up to 22 million neurons and 11 billion synapses, which makes them the largest simulations of this type ever performed. Such model sizes correspond to the cortex of a small mammal. The SPLIT library, used for these simulations, runs on single-processor as well as massively parallel machines. Performance measurements show good scaling behavior on the Blue Gene/L supercomputer up to 8,192 processors. Several key phenomena seen in the living brain appear as emergent phenomena in the simulations. We discuss the role of this kind of model in neuroscience and note that full-scale models may be necessary to preserve natural dynamics. We also discuss the need for software tools for the specification of models as well as for analysis and visualization of output data. Combining models that range from abstract connectionist type to biophysically detailed will help us unravel the basic principles underlying neocortical function.

  • 12.
    Ekman, Diana
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Domain rearrangement and creation in protein evolution2008Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Proteins are composed of domains, recurrent protein fragments with distinct structure, function and evolutionary history. Some domains exist only as single domain proteins, however, a majority of them are also combined with other domains. Domain rearrangements are important in the evolution of new proteins as new functionalities can arise in a single evolutionary event. In addition, the domain repertoire can be expanded through mutations of existing domains and de novo creation. The processes of domain rearrangement and creation have been the focus of this thesis.

    According to our estimates about 65% of the eukaryotic and 40% of the prokaryotic proteins are of multidomain type. We found that insertion of a single domain at the N- or C-terminus was the most common event in the creation of novel multidomain architectures. However, domain repeats deviate from this pattern and are often expanded through duplications of several domains. Next, by mapping domain combinations onto an evolutionary tree we estimated that roughly one domain architecture has been created per million years, with the highest rates in metazoa. Much of this so called explosion of new architectures in metazoa seems to be explained by a set of domains amenable to exon shuffling. In contrast to domain architectures, most known domain families evolved early. However, many proteins have incomplete domain coverage, and could hence contain de novo created domains. In Saccharomyces cerevisiae, however, species specific sequences constitute only a minor fraction of the proteome, and are often short, disordered sequences located at the protein termini.

  • 13. Eriksson, Johan
    et al.
    Vogel, Edward K.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). KTH Royal Institute of Technology, Sweden.
    Bergström, Fredrik
    Nyberg, Lars
    Neurocognitive Architecture of Working Memory2015In: Neuron, ISSN 0896-6273, E-ISSN 1097-4199, Vol. 88, no 1, 33-46 p.Article, review/survey (Refereed)
    Abstract [en]

    A crucial role for working memory in temporary information processing and guidance of complex behavior has been recognized for many decades. There is emerging consensus that working-memory maintenance results from the interactions among long-term memory representations and basic processes, including attention, that are instantiated as reentrant loops between frontal and posterior cortical areas, as well as sub-cortical structures. The nature of such interactions can account for capacity limitations, lifespan changes, and restricted transfer after working-memory training. Recent data and models indicate that working memory may also be based on synaptic plasticity and that working memory can operate on non-consciously perceived information.

  • 14. Fiebig, Florian
    et al.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    A Spiking Working Memory Model Based on Hebbian Short-Term Potentiation2017In: Journal of Neuroscience, ISSN 0270-6474, E-ISSN 1529-2401, Vol. 37, no 1, 83-96 p.Article in journal (Refereed)
    Abstract [en]

    A dominant theory of working memory (WM), referred to as the persistent activity hypothesis, holds that recurrently connected neural networks, presumably located in the prefrontal cortex, encode and maintain WM memory items through sustained elevated activity. Reexamination of experimental data has shown that prefrontal cortex activity in single units during delay periods is much more variable than predicted by such a theory and associated computational models. Alternative models of WM maintenance based on synaptic plasticity, such as short-term nonassociative (non-Hebbian) synaptic facilitation, have been suggested but cannot account for encoding of novel associations. Here we test the hypothesis that a recently identified fast-expressing form of Hebbian synaptic plasticity (associative short-term potentiation) is a possible mechanism for WM encoding and maintenance. Our simulations using a spiking neural network model of cortex reproduce a range of cognitive memory effects in the classical multi-item WM task of encoding and immediate free recall of word lists. Memory reactivation in the model occurs in discrete oscillatory bursts rather than as sustained activity. We relate dynamic network activity as well as key synaptic characteristics to electrophysiological measurements. Our findings support the hypothesis that fast Hebbian short-term potentiation is a key WM mechanism.

  • 15.
    Frings, Oliver
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Network and gene expression analyses for understanding protein function2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Biological function is the result of a complex network of functional associations between genes or their products. Modeling the dynamics underlying biological networks is one of the big challenges in bioinformatics. A first step towards solving this problem is to predict and study the networks of functional associations underlying various conditions.

    An improved version of the FunCoup network inference method that features networks for three new species and updated versions of the existing networks is presented. Network clustering, i.e. partitioning networks into highly connected components is an important tool for network analysis. We developed MGclus, a clustering method for biological networks that scores shared network neighbors. We found MGclus to perform favorably compared to other methods popular in the field. Studying sets of experimentally derived genes in the context of biological networks is a common strategy to shed light on their underlying biology. The CrossTalkZ method presented in this work assesses the statistical significance of crosstalk enrichment, i.e. the extent of connectivity between or within groups of functionally coupled genes or proteins in biological networks. We further demonstrate that CrossTalkZ is a valuable method to functionally annotate experimentally derived gene sets.

    Males and females differ in the expression of an extensive number of genes. The methods developed in the first part of this work were applied to study sex-biased genes in chicken and several network properties related to the molecular mechanisms of sex-biased gene regulation in chicken were deduced. Cancer studies have shown that tumor progression is strongly determined by the tumor microenvironment. We derived a gene expression signature of PDGF-activated fibroblasts that shows a strong prognostic significance in breast cancer in univariate and multivariate survival analyses when compared to established markers for prognosis.

  • 16.
    Frings, Oliver
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Augsten, Martin
    Tobin, Nicholas P.
    Carlson, Joseph
    Paulsson, Janna
    Pena, Cristina
    Olsson, Eleonor
    Veerla, Sunny
    Bergh, Jonas
    Östman, Arne
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Prognostic significance in breast cancer of a gene signature capturing stromal PDGF signalingIn: American Journal of Pathology, ISSN 0002-9440, E-ISSN 1525-2191Article in journal (Refereed)
  • 17.
    Granholm, Viktor
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    The accuracy of statistical confidence estimates in shotgun proteomics2014Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    High-throughput techniques are currently some of the most promising methods to study molecular biology, with the potential to improve medicine and enable new biological applications. In proteomics, the large scale study of proteins, the leading method is mass spectrometry. At present researchers can routinely identify and quantify thousands of proteins in a single experiment with the technique called shotgun proteomics.

    A challenge of these experiments is the computational analysis and the interpretation of the mass spectra. A shotgun proteomics experiment easily generates tens of thousands of spectra, each thought to represent a peptide from a protein. Due to the immense biological and technical complexity, however, our computational tools often misinterpret these spectra and derive incorrect peptides. As a consequence, the biological interpretation of the experiment relies heavily on the statistical confidence that we estimate for the identifications.

    In this thesis, I have included four articles from my research on the accuracy of the statistical confidence estimates in shotgun proteomics, how to accomplish and evaluate it. In the first two papers a new method to use pre-characterized protein samples to evaluate this accuracy is presented. The third paper deals with how to avoid statistical inaccuracies when using machine learning techniques to analyze the data. In the fourth paper, we present a new tool for analyzing shotgun proteomics results, and evaluate the accuracy of  its statistical estimates using the method from the first papers.

    The work I have included here can facilitate the development of new and accurate computational tools in mass spectrometry-based proteomics. Such tools will help making the interpretation of the spectra and the downstream biological conclusions more reliable.

  • 18.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kim, Sangtae
    Navarro, José C. F.
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Smith, Richard D.
    Käll, Lukas
    Fast and Accurate Database Searches with MS-GF plus Percolator:  2014In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 13, no 2, 890-897 p.Article in journal (Refereed)
    Abstract [en]

    One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

  • 19.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Navarro, Jose Fernandez
    Noble, William Stafford
    Käll, Lukas
    Determining the calibration of confidence estimation procedures for unique peptides in shotgun proteomics2013In: Journal of Proteomics, ISSN 1874-3919, Vol. 80, 123-131 p.Article in journal (Refereed)
    Abstract [en]

    The analysis of a shotgun proteomics experiment results in a list of peptide-spectrum matches (PSMs) in which each fragmentation spectrum has been matched to a peptide in a database. Subsequently, most protein inference algorithms rank peptides according to the best-scoring PSM for each peptide. However, there is disagreement in the scientific literature on the best method to assess the statistical significance of the resulting peptide identifications. Here, we use a previously described calibration protocol to evaluate the accuracy of three different peptide-level statistical confidence estimation procedures: the classical Fisher's method, and two complementary procedures that estimate significance, respectively, before and after selecting the top-scoring PSM for each spectrum. Our experiments show that the latter method, which is employed by MaxQuant and Percolator, produces the most accurate, well-calibrated results.

  • 20.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Noble, William Stafford
    Käll, Lukas
    A cross-validation scheme for machine learning algorithms in shotgun proteomics2012In: BMC Bioinformatics, ISSN 1471-2105, Vol. 13, S3- p.Article in journal (Refereed)
    Abstract [en]

    Peptides are routinely identified from mass spectrometry-based proteomics experiments by matching observed spectra to peptides derived from protein databases. The error rates of these identifications can be estimated by target-decoy analysis, which involves matching spectra to shuffled or reversed peptides. Besides estimating error rates, decoy searches can be used by semi-supervised machine learning algorithms to increase the number of confidently identified peptides. As for all machine learning algorithms, however, the results must be validated to avoid issues such as overfitting or biased learning, which would produce unreliable peptide identifications. Here, we discuss how the target-decoy method is employed in machine learning for shotgun proteomics, focusing on how the results can be validated by cross-validation, a frequently used validation scheme in machine learning. We also use simulated data to demonstrate the proposed cross-validation scheme's ability to detect overfitting.

  • 21.
    Granholm, Viktor
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Noble, William Stafford
    Käll, Lukas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    On Using Samples of Known Protein Content to Assess the Statistical Calibration of Scores Assigned to Peptide-Spectrum Matches in Shotgun Proteomics2011In: Journal of Proteome Research, ISSN 1535-3893, E-ISSN 1535-3907, Vol. 10, no 5, 2671-2678 p.Article in journal (Refereed)
    Abstract [en]

    In shotgun proteomics, the quality of a hypothesized match between an observed spectrum and a peptide sequence is quantified by a score function. Because the score function lies at the heart of any peptide identification pipeline, this function greatly affects the final results of a proteomics assay. Consequently, valid statistical methods for assessing the quality of a given score function are extremely important. Previously, several research groups have used samples of known protein composition to assess the quality of a given score function. We demonstrate that this approach is problematic, because the outcome can depend on factors other than the score function itself. We then propose an alternative use of the same type of data to validate a score function. The central idea of our approach is that database matches that are not explained by any protein in the purified sample comprise a robust representation of incorrect matches. We apply our alternative assessment scheme to several commonly used score functions, and we show that our approach generates a reproducible measure of the calibration of a given peptide identification method. Furthermore, we show how our quality test can be useful in the development of novel score functions.

  • 22. Hayat, Sikander
    et al.
    Peters, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tsirigos, Konstantinos D.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Inclusion of dyad-repeat pattern improves topology prediction of transmembrane beta-barrel proteins2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 10, 1571-1573 p.Article in journal (Refereed)
    Abstract [en]

    Accurate topology prediction of transmembrane beta-barrels is still an open question. Here, we present BOCTOPUS2, an improved topology prediction method for transmembrane beta-barrels that can also identify the barrel domain, predict the topology and identify the orientation of residues in transmembrane beta-strands. The major novelty of BOCTOPUS2 is the use of the dyad-repeat pattern of lipid and pore facing residues observed in transmembrane beta-barrels. In a cross-validation test on a benchmark set of 42 proteins, BOCTOPUS2 predicts the correct topology in 69% of the proteins, an improvement of more than 10% over the best earlier method (BOCTOPUS) and in addition, it produces significantly fewer erroneous predictions on non-transmembrane beta-barrel proteins.

  • 23. Hee, Siew Wan
    et al.
    Hamborg, Thomas
    Day, Simon
    Madan, Jason
    Miller, Frank
    Stockholm University, Faculty of Social Sciences, Department of Statistics.
    Posch, Martin
    Zohar, Sarah
    Stallard, Nigel
    Decision-theoretic designs for small trials and pilot studies: A review2016In: Statistical Methods in Medical Research, ISSN 0962-2802, E-ISSN 1477-0334, Vol. 25, no 3, 1022-1038 p.Article, review/survey (Refereed)
    Abstract [en]

    Pilot studies and other small clinical trials are often conducted but serve a variety of purposes and there is little consensus on their design. One paradigm that has been suggested for the design of such studies is Bayesian decision theory. In this article, we review the literature with the aim of summarizing current methodological developments in this area. We find that decision-theoretic methods have been applied to the design of small clinical trials in a number of areas. We divide our discussion of published methods into those for trials conducted in a single stage, those for multi-stage trials in which decisions are made through the course of the trial at a number of interim analyses, and those that attempt to design a series of clinical trials or a drug development programme. In all three cases, a number of methods have been proposed, depending on the decision maker’s perspective being considered and the details of utility functions that are used to construct the optimal design.

  • 24.
    Höhna, Sebastian
    Stockholm University, Faculty of Science, Department of Mathematics.
    Fast simulation of reconstructed phylogenies under global time-dependent birth-death processes2013In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 29, no 11, 1367-1374 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Diversification rates and patterns may be inferred from reconstructed phylogenies. Both the time-dependent and the diversity-dependent birthdeath process can produce the same observed patterns of diversity over time. To develop and test new models describing the macro-evolutionary process of diversification, generic and fast algorithms to simulate under these models are necessary. Simulations are not only important for testing and developing models but play an influential role in the assessment of model fit.

    Results: In the present article, I consider as the model a global time-dependent birthdeath process where each species has the same rates but rates may vary over time. For this model, I derive the likelihood of the speciation times from a reconstructed phylogenetic tree and show that each speciation event is independent and identically distributed. This fact can be used to simulate efficiently reconstructed phylogenetic trees when conditioning on the number of species, the time of the process or both. I show the usability of the simulation by approximating the posterior predictive distribution of a birthdeath process with decreasing diversification rates applied on a published bird phylogeny (family Cettiidae).

    Availability: The methods described in this manuscript are implemented in the R package TESS, available from the repository CRAN (http://cran.r-project.org/web/packages/TESS/).

  • 25. Jonasdottir, Gudrun
    et al.
    Humphreys, Keith
    Palmgren, Juni
    Stockholm University, Faculty of Science, Department of Mathematics.
    Testing association in the presence of linkage2007In: Genetic Epidemiology, Vol. 31, 528-540 p.Article in journal (Refereed)
  • 26.
    Kahles, André
    et al.
    Kungliga Tekniska Högskolan.
    Sarqume, Fahad
    Kungliga Tekniska Högskolan.
    Savolainen, Peter
    Kungliga Tekniska Högskolan.
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Excap: maximization of haplotypic diversity of linked markers.2013In: PLoS One, ISSN 1932-6203, Vol. 8, no 11, e79012- p.Article in journal (Refereed)
    Abstract [en]

    Genetic markers, defined as variable regions of DNA, can be utilized for distinguishing individuals or populations. As long as markers are independent, it is easy to combine the information they provide. For nonrecombinant sequences like mtDNA, choosing the right set of markers for forensic applications can be difficult and requires careful consideration. In particular, one wants to maximize the utility of the markers. Until now, this has mainly been done by hand. We propose an algorithm that finds the most informative subset of a set of markers. The algorithm uses a depth first search combined with a branch-and-bound approach. Since the worst case complexity is exponential, we also propose some data-reduction techniques and a heuristic. We implemented the algorithm and applied it to two forensic caseworks using mitochondrial DNA, which resulted in marker sets with significantly improved haplotypic diversity compared to previous suggestions. Additionally, we evaluated the quality of the estimation with an artificial dataset of mtDNA. The heuristic is shown to provide extensive speedup at little cost in accuracy.

  • 27. Kaplan, Bernhard A.
    et al.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Masson, Guillaume S.
    Perrinet, Laurent U.
    Anisotropic connectivity implements motion-based prediction in a spiking neural network2013In: Frontiers in Computational Neuroscience, ISSN 1662-5188, Vol. 7, UNSP 112- p.Article in journal (Refereed)
    Abstract [en]

    Predictive coding hypothesizes that the brain explicitly infers upcoming sensory input to establish a coherent representation of the world. Although it is becoming generally accepted, it is not clear on which level spiking neural networks may implement predictive coding and what function their connectivity may have. We present a network model of conductance-based integrate-and-fire neurons inspired by the architecture of retinotopic cortical areas that assumes predictive coding is implemented through network connectivity, namely in the connection delays and in selectiveness for the tuning properties of source and target cells. We show that the applied connection pattern leads to motion-based prediction in an experiment tracking a moving dot. In contrast to our proposed model, a network with random or isotropic connectivity fails to predict the path when the moving dot disappears. Furthermore, we show that a simple linear decoding approach is sufficient to transform neuronal spiking activity into a probabilistic estimate for reading out the target trajectory.

  • 28. Ke, Rongqin
    et al.
    Mignardi, Marco
    Hauling, Thomas
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nilsson, Mats
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Fourth Generation of Next-Generation Sequencing Technologies: Promise and Consequences2016In: Human Mutation, ISSN 1059-7794, E-ISSN 1098-1004, Vol. 37, no 12, 1363-1367 p.Article, review/survey (Refereed)
    Abstract [en]

    In this review, we discuss the emergence of the fourth-generation sequencing technologies that preserve the spatial coordinates of RNA and DNA sequences with up to subcellular resolution, thus enabling back mapping of sequencing reads to the original histological context. This information is used, for example, in two current large-scale projects that aim to unravel the function of the brain. Also in cancer research, fourth-generation sequencing has the potential to revolutionize the field. Cancer Research UK has named Mapping the molecular and cellular tumor microenvironment in order to define new targets for therapy and prognosis one of the grand challenges in tumor biology. We discuss the advantages of sequencing nucleic acids directly in fixed cells over traditional next-generation sequencing (NGS) methods, the limitations and challenges that these new methods have to face to become broadly applicable, and the impact that the information generated by the combination of in situ sequencing and NGS methods will have in research and diagnostics.

  • 29. Kenah, Eben
    et al.
    Britton, Tom
    Stockholm University, Faculty of Science, Department of Mathematics.
    Halloran, M. Elizabeth
    Longini, Ira M.
    Molecular Infectious Disease Epidemiology: Survival Analysis and Algorithms Linking Phylogenies to Transmission Trees2016In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 12, no 4Article in journal (Refereed)
    Abstract [en]

    Recent work has attempted to use whole-genome sequence data from pathogens to reconstruct the transmission trees linking infectors and infectees in outbreaks. However, transmission trees from one outbreak do not generalize to future outbreaks. Reconstruction of transmission trees is most useful to public health if it leads to generalizable scientific insights about disease transmission. In a survival analysis framework, estimation of transmission parameters is based on sums or averages over the possible transmission trees. A phylogeny can increase the precision of these estimates by providing partial information about who infected whom. The leaves of the phylogeny represent sampled pathogens, which have known hosts. The interior nodes represent common ancestors of sampled pathogens, which have unknown hosts. Starting from assumptions about disease biology and epidemiologic study design, we prove that there is a one-to-one correspondence between the possible assignments of interior node hosts and the transmission trees simultaneously consistent with the phylogeny and the epidemiologic data on person, place, and time. We develop algorithms to enumerate these transmission trees and show these can be used to calculate likelihoods that incorporate both epidemiologic data and a phylogeny. A simulation study confirms that this leads to more efficient estimates of hazard ratios for infectiousness and baseline hazards of infectious contact, and we use these methods to analyze data from a foot-and-mouth disease virus outbreak in the United Kingdom in 2001. These results demonstrate the importance of data on individuals who escape infection, which is often overlooked. The combination of survival analysis and algorithms linking phylogenies to transmission trees is a rigorous but flexible statistical foundation for molecular infectious disease epidemiology.

  • 30. Khan, Mehmood Alam
    et al.
    Elias, Isaac
    Sjölund, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nylander, Kristina
    Guimera, Roman Valls
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Schobesberger, Richard
    Schmitzberger, Peter
    Lagergren, Jens
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Fastphylo: Fast tools for phylogenetics2013In: BMC Bioinformatics, ISSN 1471-2105, Vol. 14, 334- p.Article in journal (Refereed)
    Abstract [en]

    BACKGROUND: Distance methods are ubiquitous tools in phylogenetics.Their primary purpose may be to reconstructevolutionary history, but they are also used as components in bioinformatic pipelines. However, poorcomputational efficiency has been a constraint on the applicability of distance methods on very largeproblem instances.

    RESULTS: We present fastphylo, a software package containing implementations of efficient algorithms for twocommon problems in phylogenetics: estimating DNA/protein sequence distances and reconstructing aphylogeny from a distance matrix. We compare fastphylo with other neighbor joining based methodsand report the results in terms of speed and memory efficiency.

    CONCLUSIONS: Fastphylo is a fast, memory efficient, and easy to use software suite. Due to its modular architecture,fastphylo is a flexible tool for many phylogenetic studies.

  • 31.
    Krishnamurthy, Pradeep
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Silberberg, Gilad
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    A cortical attractor network with martinotti cells driven by facilitating synapses2012In: PLoS ONE, ISSN 1932-6203, E-ISSN 1932-6203, Vol. 7, no 4, e30752Article in journal (Refereed)
    Abstract [en]

    The population of pyramidal cells significantly outnumbers the inhibitory interneurons in the neocortex, while at the same time the diversity of interneuron types is much more pronounced. One acknowledged key role of inhibition is to control the rate and patterning of pyramidal cell firing via negative feedback, but most likely the diversity of inhibitory pathways is matched by a corresponding diversity of functional roles. An important distinguishing feature of cortical interneurons is the variability of the short-term plasticity properties of synapses received from pyramidal cells. The Martinotti cell type has recently come under scrutiny due to the distinctly facilitating nature of the synapses they receive from pyramidal cells. This distinguishes these neurons from basket cells and other inhibitory interneurons typically targeted by depressing synapses. A key aspect of the work reported here has been to pinpoint the role of this variability. We first set out to reproduce quantitatively based on in vitro data the di-synaptic inhibitory microcircuit connecting two pyramidal cells via one or a few Martinotti cells. In a second step, we embedded this microcircuit in a previously developed attractor memory network model of neocortical layers 2/3. This model network demonstrated that basket cells with their characteristic depressing synapses are the first to discharge when the network enters an attractor state and that Martinotti cells respond with a delay, thereby shifting the excitation-inhibition balance and acting to terminate the attractor state. A parameter sensitivity analysis suggested that Martinotti cells might, in fact, play a dominant role in setting the attractor dwell time and thus cortical speed of processing, with cellular adaptation and synaptic depression having a less prominent role than previously thought.

  • 32.
    Larsson, Per
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Prediction, modeling, and refinement of protein structure2010Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Accurate predictions of protein structure are important for understanding many processes in cells. The interactions that govern protein folding and structure are complex, and still far from completely understood. However, progress is being made in many areas. Here, efforts to improve the overall quality of protein structure models are described. From a pure evolutionary perspective, in which proteins are viewed in the light of gradually accumulated mutations on the sequence level, it is shown how information from multiple sources helps to create more accurate models. A very simple but surprisingly accurate method for assigning confidence measures for protein structures is also tested. In contrast to models based on evolution, physics based methods view protein structures as the result of physical interactions between atoms. Newly implemented methods are described that both increase the time-scales accessible for molecular dynamics simulations almost 10-fold, and that to some extent might be able to refine protein structures. Finally, I compare the efficiency and properties of different techniques for protein structure refinement.

  • 33.
    Light, Sara
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sagit, Rauan
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Ekman, Diana
    Karolinska Institute, Sweden.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Long indels are disordered: A study of disorder and indels in homologous eukaryotic proteins2013In: Biochimica et Biophysica Acta - Proteins and Proteomics, ISSN 1570-9639, E-ISSN 1878-1454, Vol. 1834, no 5, 890-897 p.Article in journal (Refereed)
    Abstract [en]

    Proteins evolve through point mutations as well as by insertions and deletions (indels). During the last decade it has become apparent that protein regions that do not fold into three-dimensional structures, i.e. intrinsically disordered regions, are quite common. Here, we have studied the relationship between protein disorder and indels using HMM-HMM pairwise alignments in two sets of orthologous eukaryotic protein pairs. First, we show that disordered residues are much more frequent among indel residues than among aligned residues and, also are more prevalent among indels than in coils. Second, we observed that disordered residues are particularly common in longer indels. Disordered indels of short-to-medium size are prevalent in the non-terminal regions of proteins while the longest indels, ordered and disordered alike, occur toward the termini of the proteins where new structural units are comparatively well tolerated. Finally, while disordered regions often evolve faster than ordered regions and disorder is common in indels, there are some previously recognized protein families where the disordered region is more conserved than the ordered region. We find that these rare proteins are often involved in information processes, such as RNA processing and translation. This article is part of a Special Issue entitled: The emerging dynamic view of proteins: Protein plasticity in allostery, evolution and self-assembly.

  • 34. Lindh, Martin
    et al.
    Karlen, Anders
    Norinder, Ulf
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Karolinska Institutet, Sweden.
    Predicting the Rate of Skin Penetration Using an Aggregated Conformal Prediction Framework2017In: Molecular Pharmaceutics, ISSN 1543-8384, E-ISSN 1543-8392, Vol. 14, no 5, 1571-1576 p.Article in journal (Refereed)
    Abstract [en]

    Skin serves as a drug administration route, and skin permeability of chemicals is of significant interest in the pharmaceutical and cosmetic industries. An aggregated conformal prediction (ACP) framework was used to build models, for predicting the permeation rate (log K-p) of chemical compounds through human skin. The conformal prediction method gives as an output the prediction range at a given level of confidence for each compound, which enables the user to make a more informed decision when, for example, suggesting the next compound to prepare, Predictive models were built using;both the random forest and the support vector machine methods and were based on experimentally derived permeability data on 211 diverse compounds. The derived models were of similar predictive quality as compared to earlier published models but have the extra advantage of not only presenting a single predicted value for each, compound but also a reliable, individually assigned prediction range. The models use calculated descriptors and can quickly predict the skin permeation rate of new compounds.

  • 35.
    Lundqvist, Mikael
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
    Oscillations and spike statistics in biophysical attractor networks2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The work of this thesis concerns how cortical memories are stored and retrieved. In particular, large-scale simulations are used to investigate the extent to which associative attractor theory is compliant with known physiology and in vivo dynamics.

    The first question we ask is whether dynamical attractors can be stored in a network with realistic connectivity and activity levels. Using estimates of biological connectivity we demonstrated that attractor memories can be stored and retrieved in biologically realistic networks, operating on psychophysical timescales and displaying firing rate patterns similar to in vivo layer 2/3 cells. This was achieved in the presence of additional complexity such as synaptic depression and cellular adaptation.

    Fast transitions into attractor memory states were related to the self-balancing inhibitory and excitatory currents in the network. In order to obtain realistic firing rates in the network, strong feedback inhibition was used, dynamically maintaining balance for a wide range of excitation levels. The balanced currents also led to high spike train variability commonly observed in vivo. The feedback inhibition in addition resulted in emergent gamma oscillations associated with attractor retrieval. This is congruent with the view of gamma as accompanying active cortical processing.

    While dynamics during retrieval of attractor memories did not depend on the size of the simulated network, above a certain size the model displayed the presence of an emergent attractor state, not coding for any memory but active as a default state of the network. This default state was accompanied by oscillations in the alpha frequency band. Such alpha oscillations are correlated with idling and cortical inhibition in vivo and have similar functional correlates in the model. Both inhibitory and excitatory, as well as phase effects of ongoing alpha observed in vivo was reproduced in the model in a simulated threshold-stimulus detection task.

  • 36.
    Lundqvist, Mikael
    et al.
    Royal Institute of Technology, Department of Computational Biology.
    Herman, Pawel
    Royal Institute of Technology, Department of Computational Biology.
    Lansner, Anders
    Royal Institute of Technology, Department of Computational Biology.
    Variability of spike firing during theta-coupled replay of memories in a simulated attractor network2012In: Brain Research, ISSN 0006-8993, E-ISSN 1872-6240, Vol. 1434, 152-161 p.Article in journal (Refereed)
    Abstract [en]

    Simulation work has recently shown that attractor networks can reproduce Poisson-like variability of single cell spiking, with coefficient of variation (Cv(2)) around unity, consistent with cortical data. However, the use of local variability (Lv) measures has revealed area- and layer-specific deviations from Poisson-like firing. In order to test these findings in silico we used a biophysically detailed attractor network model. We show that Lv well above 1, specifically found in superficial cortical layers and prefrontal areas, can indeed be reproduced in such networks and is consistent with periodic replay rather than persistent firing. The memory replay at the theta time scale provides a framework for a multi-item memory storage in the model. This article is part of a Special Issue entitled Neural Coding.

  • 37.
    Lundqvist, Mikael
    et al.
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Herman, Pawel
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Palva, M.
    Palva, S.
    Silverstein, David
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Lansner, Anders
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Royal Institute of Technology, Sweden.
    Stimulus detection rate and latency, firing rates and 1-40Hz oscillatory power are modulated by infra-slow fluctuations in a bistable attractor network model2013In: NeuroImage, ISSN 1053-8119, E-ISSN 1095-9572, Vol. 83, 458-471 p.Article in journal (Refereed)
    Abstract [en]

    Recordings of membrane and field potentials, firing rates, and oscillation amplitude dynamics show that neuronal activity levels in cortical and subcortical structures exhibit infra-slow fluctuations (ISFs) on time scales from seconds to hundreds of seconds. Similar ISFs are salient also in blood-oxygenation-level dependent (BOLD) signals as well as in psychophysical time series. Functional consequences of ISFs are not fully understood. Here, they were investigated along with dynamical implications of ISFs in large-scale simulations of cortical network activity. For this purpose, a biophysically detailed hierarchical attractor network model displaying bistability and operating in an oscillatory regime was used. ISFs were imposed as slow fluctuations in either the amplitude or frequency of fast synaptic noise. We found that both mechanisms produced an ISF component in the synthetic local field potentials (LFPs) and modulated the power of 1-40. Hz oscillations. Crucially, in a simulated threshold-stimulus detection task (TSDT), these ISFs were strongly correlated with stimulus detection probabilities and latencies. The results thus show that several phenomena observed in many empirical studies emerge concurrently in the model dynamics, which yields mechanistic insight into how infra-slow excitability fluctuations in large-scale neuronal networks may modulate fast oscillations and perceptual processing. The model also makes several novel predictions that can be experimentally tested in future studies.

  • 38.
    Michel, Mirco
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    From Sequence to Structure: Using predicted residue contacts to facilitate template-free protein structure prediction2017Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Despite the fundamental role of experimental protein structure determination, computational methods are of essential importance to bridge the ever growing gap between available protein sequence and structure data. Common structure prediction methods rely on experimental data, which is not available for about half of the known protein families.

    Recent advancements in amino acid contact prediction have revolutionized the field of protein structure prediction. Contacts can be used to guide template-free structure predictions that do not rely on experimentally solved structures of homologous proteins. Such methods are now able to produce accurate models for a wide range of protein families.

    We developed PconsC2, an approach that improved existing contact prediction methods by recognizing intra-molecular contact patterns and noise reduction. An inherent problem of contact prediction based on maximum entropy models is that large alignments with over 1000 effective sequences are needed to infer contacts accurately. These are however not available for more than 80% of all protein families that do not have a representative structure in PDB. With PconsC3, we could extend the applicability of contact prediction to families as small as 100 effective sequences by combining global inference methods with machine learning based on local pairwise measures.

    By introducing PconsFold, a pipeline for contact-based structure prediction, we could show that improvements in contact prediction accuracy translate to more accurate models. Finally, we applied a similar technique to Pfam, a comprehensive database of known protein families. In addition to using a faster folding protocol we employed model quality assessment methods, crucial for estimating the confidence in the accuracy of predicted models. We propose models tobe accurate for 558 families that do not have a representative known structure. Out of those, over 75% have not been reported before.

  • 39.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Hayat, Sikander
    Skwark, Marcin J.
    Sander, Chris
    Marks, Debora S.
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    PconsFold: improved contact predictions improve protein models2014In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 30, no 17, 1482-1488 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used.

    Results: In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved.

  • 40.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Hurtado, David M.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Uziela, Karolis
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Large-scale structure prediction by improved contact predictions and model quality assessment2017In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 33, no 14, 123-129 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: Accurate contact predictions can be used for predicting the structure of proteins. Until recently these methods were limited to very big protein families, decreasing their utility. However, recent progress by combining direct coupling analysis with machine learning methods has made it possible to predict accurate contact maps for smaller families. To what extent these predictions can be used to produce accurate models of the families is not known. Results: We present the PconsFold2 pipeline that uses contact predictions from PconsC3, the CONFOLD folding algorithm and model quality estimations to predict the structure of a protein. We show that the model quality estimation significantly increases the number of models that reliably can be identified. Finally, we apply PconsFold2 to 6379 Pfam families of unknown structure and find that PconsFold2 can, with an estimated 90% specificity, predict the structure of up to 558 Pfam families of unknown structure. Out of these 415 have not been reported before. Availability: Datasets as well as models of all the 558 Pfam families are available at http://c3.pcons.net. All programs used here are freely available.

  • 41.
    Michel, Mirco
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Skwark, Marcin J.
    Hurtado, David M.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Ekeberg, Magnus
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Predicting accurate contacts in thousands of Pfam domain families using PconsC3In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811Article in journal (Refereed)
    Abstract [en]

    Motivation: A few years ago it was shown that by using a maximum entropy approach to describe couplings between columns in a multiple sequence alignment it is possible to significantly increase the accuracy of residue contact predictions. For very large protein families with more than 1000 effective sequences the accuracy is sufficient to produce accurate models of proteins as well as complexes. Today, for about half of all Pfam domain families no structure is known, but unfortunately most of these families have at most afew hundred members, i.e. are too small for such contact prediction methods.

    Results: To extend accurate contact predictions to the thousands of smaller protein families we present PconsC3, a fast and improved method for protein contact predictions that can be used for families with even 100 effective sequence members. PconsC3 outperforms direct coupling analysis (DCA) methods significantly independent on family size, secondary structure content, contact range, or the number of selected contacts.

    Availability: PconsC3 is available as a web server and downloadable version at http://c3.pcons.net. The downloadable version is free for all to use and licensed under the GNU General Public License, version 2. At this site contact predictions for most Pfam families are also available. We do estimate that more than 4000 contact maps for Pfam families of unknown structure have more than 50% of the top-ranked contacts predicted correctly.

  • 42.
    Murail, Samuel
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Howard, Rebecca J.
    Broemstrup, Torben
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Bertaccini, Edward J.
    Harris, R. Adron
    Trudell, James R.
    Lindahl, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Molecular mechanism for the dual alcohol modulation of cys loop receptors2012In: PloS Computational Biology, ISSN 1553-734X, E-ISSN 1553-7358, Vol. 8, no 10, e1002710- p.Article in journal (Refereed)
    Abstract [en]

    Cys-loop receptors constitute a superfamily of pentameric ligand-gated ion channels (pLGICs), including receptors for acetylcholine, serotonin, glycine and gamma-aminobutyric acid. Several bacterial homologues have been identified that are excellent models for understanding allosteric binding of alcohols and anesthetics in human Cys-loop receptors. Recently, we showed that a single point mutation on a prokaryotic homologue (GLIC) could transform it from a channel weakly potentiated by ethanol into a highly ethanol-sensitive channel. Here, we have employed molecular simulations to study ethanol binding to GLIC, and to elucidate the role of the ethanol-enhancing mutation in GLIC modulation. By performing 1-mu s simulations with and without ethanol on wild-type and mutated GLIC, we observed spontaneous binding in both intra-subunit and inter-subunit transmembrane cavities. In contrast to the glycine receptor GlyR, in which we previously observed ethanol binding primarily in an inter-subunit cavity, ethanol primarily occupied an intra-subunit cavity in wild-type GLIC. However, the highly ethanol-sensitive GLIC mutation significantly enhanced ethanol binding in the inter-subunit cavity. These results demonstrate dramatic effects of the F(14')A mutation on the distribution of ligands, and are consistent with a two-site model of pLGIC inhibition and potentiation.

  • 43.
    Norinder, Ulf
    et al.
    Stockholm University, Faculty of Social Sciences, Department of Computer and Systems Sciences. Swedish Toxicology Sciences Research Center, Sweden.
    Rybacka, A.
    Andersson, P. L.
    Conformal prediction to define applicability domain - A case study on predicting ER and AR binding2016In: SAR and QSAR in environmental research (Print), ISSN 1062-936X, E-ISSN 1029-046X, Vol. 27, no 4, 303-316 p.Article in journal (Refereed)
    Abstract [en]

    A fundamental element when deriving a robust and predictive in silico model is not only the statistical quality of the model in question but, equally important, the estimate of its predictive boundaries. This work presents a new method, conformal prediction, for applicability domain estimation in the field of endocrine disruptors. The method is applied to binders and non-binders related to the oestrogen and androgen receptors. Ensembles of decision trees are used as statistical method and three different sets (dragon, rdkit and signature fingerprints) are investigated as chemical descriptors. The conformal prediction method results in valid models where there is an excellent balance in quality between the internally validated training set and the corresponding external test set, both in terms of validity and with respect to sensitivity and specificity. With this method the level of confidence can be readily altered by the user and the consequences thereof immediately inspected. Furthermore, the predictive boundaries for the derived models are rigorously defined by using the conformal prediction framework, thus no ambiguity exists as to the level of similarity needed for new compounds to be in or out of the predictive boundaries of the derived models where reliable predictions can be expected.

  • 44.
    Norén, G. Niklas
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics.
    Hopstadius, Johan
    Bate, Andrew
    Shrinkage observed-to-expected ratios for robust and transparent large-scale pattern discovery2013In: Statistical Methods in Medical Research, ISSN 0962-2802, E-ISSN 1477-0334, Vol. 22, no 1, 57-69 p.Article in journal (Refereed)
    Abstract [en]

    Large observational data sets are a great asset to better understand the effects of medicines in clinical practice and, ultimately, improve patient care. For an empirical pattern in observational data to be of practical relevance, it should represent a substantial deviation from the null model. For the purpose of identifying such deviations, statistical significance tests are inadequate, as they do not on their own distinguish the magnitude of an effect from its data support. The observed-to-expected (OE) ratio on the other hand directly measures strength of association and is an intuitive basis to identify a range of patterns related to event rates, including pairwise associations, higher order interactions and temporal associations between events over time. It is sensitive to random fluctuations for rare events with low expected counts but statistical shrinkage can protect against spurious associations. Shrinkage OE ratios provide a simple but powerful framework for large-scale pattern discovery. In this article, we outline a range of patterns that are naturally viewed in terms of OE ratios and propose a straightforward and effective statistical shrinkage transformation that can be applied to any such ratio. The proposed approach retains emphasis on the practical relevance and transparency of highlighted patterns, while protecting against spurious associations.

  • 45.
    Ogris, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Guala, Dimitri
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Kaduk, Mateusz
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Sonnhammer, Erik
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    FunCoup 4: New species, data, and visualizationManuscript (preprint) (Other academic)
  • 46.
    Olsson, Fredrik
    et al.
    Stockholm University, Faculty of Science, Department of Mathematics.
    Hössjer, Ola
    Stockholm University, Faculty of Science, Department of Mathematics.
    Equilibrium distributions and simulation methods for age structured populations2015In: Mathematical Biosciences, ISSN 0025-5564, E-ISSN 1879-3134, Vol. 268, 45-51 p.Article in journal (Refereed)
    Abstract [en]

    A simulation method is presented for the demographic and genetic variation of age structured haploid populations. First, we use matrix analytic methods to derive an equilibrium distribution for the age class sizes conditioned on the total population size. Knowledge of this distribution eliminates the need of a burn-in time in simulations. Next, we derive the distribution of the alleles at a polymorphic locus in various age classes given the allele frequencies in the total population and the age size composition. For the time dynamics, we start by simulating the dynamics for the total population. In order to generate the inheritance of the alleles, we derive their distribution conditionally on the simulated population sizes. This method enables a fast simulation procedure of multiple loci in linkage equilibrium.

  • 47.
    Peters, Christoph
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics.
    Topology Prediction of α-Helical Transmembrane Proteins2016Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Membrane proteins fulfil a number of tasks in cells, including signalling, cell-cell interaction, and the transportation of molecules. The prominence of these tasks makes membrane proteins an important target for clinical drugs. Because of the decreasing price of sequencing, the number of sequences known is increasing at such a rate that manual annotations cannot compete. Here, topology prediction is a way to provide additional information. It predicts the location and number of transmembrane helices in the protein and the orientation inside the membrane. An important factor to detect transmembrane helices is their hydrophobicity, which can be calculated using dedicated scales. In the first paper, we studied the difference between several hydrophobicity scales and evaluated their performance. We showed that while they appear to be similar, their performance for topology prediction differs significantly. The better performing scales appear to measure the probability of amino acids to be within a transmembrane helix, instead of just being located in a hydrophobic environment.

    Around 20% of the transmembrane helices are too hydrophilic to explain their insertion with hydrophobicity alone. These are referred to as marginally hydrophobic helices. In the second paper, we studied three of these helices experimentally and performed an analysis on membrane proteins. The experiments show that for all three helices positive charges on the N-terminal side of the subsequent helix are important to insert, but only two need the subsequent helix. Additionally, the analysis shows that not only the N-terminal helices are more hydrophobic, but also the C-terminal transmembrane helices.

    In Paper III, the finding from the second paper was used to improve the topology prediction. By extending our hidden Markov model with N- and C-terminal helix states, we were able to set stricter cut-offs. This improved the general topology prediction and in particular miss-prediction in large N- and C-terminal domains, as well the separation between transmembrane and non-transmembrane proteins.

    Lastly, we contribute several new features to our consensus topology predictor, TOPCONS. We added states for the detection of signal peptides to its hidden Markov model and thus reduce the over-prediction of transmembrane helices. With a new method for the generation of profile files, it is possible to increase the size of the database used to find homologous proteins and decrease the running time by 75%.

  • 48.
    Peters, Christoph
    et al.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Tsirigos, Kostantionos D.
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Shu, Nanjiang
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Elofsson, Arne
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Improved topology prediction using the terminal hydrophobic helices rule2016In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 32, no 8, 1158-1162 p.Article in journal (Refereed)
    Abstract [en]

    Motivation: The translocon recognizes sufficiently hydrophobic regions of a protein and inserts them into the membrane. Computational methods try to determine what hydrophobic regions are recognized by the translocon. Although these predictions are quite accurate, many methods still fail to distinguish marginally hydrophobic transmembrane (TM) helices and equally hydrophobic regions in soluble protein domains. In vivo, this problem is most likely avoided by targeting of the TM-proteins, so that non-TM proteins never see the translocon. Proteins are targeted to the translocon by an N-terminal signal peptide. The targeting is also aided by the fact that the N-terminal helix is more hydrophobic than other TM-helices. In addition, we also recently found that the C-terminal helix is more hydrophobic than central helices. This information has not been used in earlier topology predictors.

    Results: Here, we use the fact that the N- and C-terminal helices are more hydrophobic to develop a new version of the first-principle-based topology predictor, SCAMPI. The new predictor has two main advantages; first, it can be used to efficiently separate membrane and non-membrane proteins directly without the use of an extra prefilter, and second it shows improved performance for predicting the topology of membrane proteins that contain large non-membrane domains.

    Availability and implementation: The predictor, a web server and all datasets are available at http://scampi.bioinfo.se/.

  • 49.
    Rinaldi, Fabio
    et al.
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Clematide, Simon
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Hafner, Simon
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Schneider, Gerold
    Institute of Computational Linguistics, University of Zurich, Switzerland.
    Grigonyte, Gintare
    Stockholm University, Faculty of Humanities, Department of Linguistics, Computational Linguistics.
    Romacker, Martin
    Novartis Pharma AG, NIBR-IT, Text Mining Services, Basel, Switzerland.
    Vachon, Therese
    Novartis Pharma AG, NIBR-IT, Text Mining Services, Basel, Switzerland.
    Using the OntoGene pipeline for the triage task of BioCreative 20122013In: Database: The Journal of Biological Databases and Curation, ISSN 1758-0463, ISSN 1758-0463Article in journal (Refereed)
    Abstract [en]

    In this article, we describe the architecture of the OntoGene Relation mining pipeline and its application in the triage task of BioCreative 2012. The aim of the task is to support the triage of abstracts relevant to the process of curation of the Comparative Toxicogenomics Database. We use a conventional information retrieval system (Lucene) to provide a baseline ranking, which we then combine with information provided by our relation mining system, in order to achieve an optimized ranking. Our approach additionally delivers domain entities mentioned in each input document as well as candidate relationships, both ranked according to a confidence score computed by the system. This information is presented to the user through an advanced interface aimed at supporting the process of interactive curation. Thanks, in particular, to the high-quality entity recognition, the OntoGene system achieved the best overall results in the task.

  • 50.
    Sahlin, Kristoffer
    et al.
    Stockholm University, Science for Life Laboratory (SciLifeLab). KTH.
    Vezzi, Francesco
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Nystedt, Björn
    Stockholm University, Faculty of Science, Department of Biochemistry and Biophysics. Stockholm University, Science for Life Laboratory (SciLifeLab).
    Lundeberg, Joakim
    KTH.
    Arvestad, Lars
    Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA). Stockholm University, Science for Life Laboratory (SciLifeLab).
    BESST - Efficient scaffolding of large fragmented assemblies2014In: BMC Bioinformatics, ISSN 1471-2105, Vol. 15, no 1, 281- p.Article in journal (Refereed)
    Abstract [en]

    Background

    The use of short reads from High Throughput Sequencing (HTS) techniques is now commonplace in de novo assembly. Yet, obtaining contiguous assemblies from short reads is challenging, thus making scaffolding an important step in the assembly pipeline. Different algorithms have been proposed but many of them use the number of read pairs supporting a linking of two contigs as an indicator of reliability. This reasoning is intuitive, but fails to account for variation in link count due to contig features.

    We have also noted that published scaffolders are only evaluated on small datasets using output from only one assembler. Two issues arise from this. Firstly, some of the available tools are not well suited for complex genomes. Secondly, these evaluations provide little support for inferring a software’s general performance. 

    Results

    We propose a new algorithm, implemented in a tool called BESST, which can scaffold genomes of all sizes and complexities and was used to scaffold the genome of P. abies (20 Gbp). We performed a comprehensive comparison of BESST against the most popular stand-alone scaffolders on a large variety of datasets. Our results confirm that some of the popular scaffolders are not practical to run on complex datasets. Furthermore, no single stand-alone scaffolder outperforms the others on all datasets. However, BESST fares favorably to the other tested scaffolders on GAGE datasets and, moreover, outperforms the other methods when library insert size distribution is wide.

    Conclusion

    We conclude from our results that information sources other than the quantity of links, as is commonly used, can provide useful information about genome structure when scaffolding. 

12 1 - 50 of 63
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf