Background: Xenacoelomorpha is a marine clade of microscopic worms that is an important model system for understanding the evolution of key bilaterian novelties, such as the excretory system. Nevertheless, Xenacoelomorpha genomics has been restricted to a few species that either can be cultured in the lab or are centimetres long. Thus far, no genomes are available for Nemertodermatida, one of the group's main clades and whose origin has been dated more than 400 million years ago.Methods: DNA was extracted from a single specimen and sequenced with HiFi following the PacBio Ultra-Low DNA Input protocol. After genome assembly, decontamination, and annotation, the genome quality was benchmarked using two acoel genomes and one Illumina genome as reference. The gene content of three cnidarians, three acoelomorphs, four deuterostomes, and eight protostomes was clustered in orthogroups to make inferences of gene content evolution. Finally, we focused on the genes related to the ultrafiltration excretory system to compare patterns of presence/absence and gene architecture among these clades.Results: We present the first nemertodermatid genome sequenced from a single specimen of Nemertoderma westbladi. Although genome contiguity remains challenging (N50: 60 kb), it is very complete (BUSCO: 80.2%, Metazoa; 88.6%, Eukaryota) and the quality of the annotation allows fine-detail analyses of genome evolution. Acoelomorph genomes seem to be relatively conserved in terms of the percentage of repeats, number of genes, number of exons per gene and intron size. In addition, a high fraction of genes present in both protostomes and deuterostomes are absent in Acoelomorpha. Interestingly, we show that all genes related to the excretory system are present in Xenacoelomorpha except Osr, a key element in the development of these organs and whose acquisition seems to be interconnected with the origin of the specialised excretory system.Conclusion: Overall, these analyses highlight the potential of the Ultra-Low Input DNA protocol and HiFi to generate high-quality genomes from single animals, even for relatively large genomes, making it a feasible option for sequencing challenging taxa, which will be an exciting resource for comparative genomics analyses.
Functional analysis of gene sets derived from experiments is typically done by pathway annotation. Although many algorithms exist for analyzing the association between a gene set and a pathway, an issue which is generally ignored is that gene sets often represent multiple pathways. In such cases an association to a pathway is weakened by the presence of genes associated with other pathways. A way to counteract this is to cluster the gene set into more homogenous parts before performing pathway analysis on each module. We explored whether network-based pre-clustering of a query gene set can improve pathway analysis. The methods MCL, Infomap, and MGclus were used to cluster the gene set projected onto the FunCoup network. We characterized how well these methods are able to detect individual pathways in multi-pathway gene sets, and applied each of the clustering methods in combination with four pathway analysis methods: Gene Enrichment Analysis, BinoX, NEAT, and ANUBIX. Using benchmarks constructed from the KEGG pathway database we found that clustering can be beneficial by increasing the sensitivity of pathway analysis methods and by providing deeper insights of biological mechanisms related to the phenotype under study. However, keeping a high specificity is a challenge. For ANUBIX, clustering caused a minor loss of specificity, while for BinoX and NEAT it caused an unacceptable loss of specificity. GEA had very low sensitivity both before and after clustering. The choice of clustering method only had a minor effect on the results. We show examples of this approach and conclude that clustering can improve overall pathway annotation performance, but should only be used if the used enrichment method has a low false positive rate.
Methylmercury (MeHg) is a developmental neurotoxicant, and one potential mechanism of MeHg toxicity is epigenetic dysregulation. In a recent meta-analysis of epigenome-wide association studies (EWAS), associations between prenatal MeHg exposure and DNA methylation at several genomic sites were identified in blood from newborns and children. While EWASs reveal human-relevant associations, experimental studies are required to validate the relationship between exposure and DNA methylation changes, and to assess if such changes have implications for gene expression. Herein, we studied DNA methylation and gene expression of five of the top genes identified in the EWAS meta-analysis, MED31, MRPL19, GGH, GRK1, and LYSMD3, upon MeHg exposure in human SH-SY5Y cells exposed to 8 or 40 nM of MeHg during differentiation, using bisulfite-pyrosequencing and qPCR, respectively. The concentrations were selected to cover the range of MeHg concentrations in cord blood (2–8.5 μg/L) observed in the cohorts included in the EWAS. Exposure to MeHg increased DNA methylation at MED31, a transcriptional regulator essential for fetal development. The results were in concordance with the epidemiological findings where more MED31 methylation was associated with higher concentrations of MeHg. Additionally, we found a non-significant decrease in DNA methylation at GGH, which corresponds to the direction of change observed in the EWAS, and a significant correlation of GGH methylation with its expression. In conclusion, this study corroborates some of the EWAS findings and puts forward candidate genes involved in MeHg’s effects on the developing brain, thus highlighting the value of experimental validation of epidemiological association studies.
The need for systematic drug repurposing has seen a steady increase over the past decade and may be particularly valuable to quickly remedy unexpected pandemics. The abundance of functional interaction data has allowed mapping of substantial parts of the human interactome modeled using functional association networks, favoring network-based drug repurposing. Network crosstalk-based approaches have never been tested for drug repurposing despite their success in the related and more mature field of pathway enrichment analysis. We have, therefore, evaluated the top performing crosstalk-based approaches for drug repurposing. Additionally, the volume of new interaction data as well as more sophisticated network integration approaches compelled us to construct a new benchmark for performance assessment of network-based drug repurposing tools, which we used to compare network crosstalk-based methods with a state-of-the-art technique. We find that network crosstalk-based drug repurposing is able to rival the state-of-the-art method and in some cases outperform it.
Heterozygotes for major chromosomal rearrangements such as fusions and fissions are expected to display a high level of sterility due to problems during meiosis. However, some species, especially plants and animals with holocentric chromosomes, are known to tolerate chromosomal heterozygosity even for multiple rearrangements. Here, we studied male meiotic chromosome behavior in four hybrid generations (F1-F4) between two chromosomal races of the Wood White butterfly Leptidea sinapis differentiated by at least 24 chromosomal fusions/fissions. Previous work showed that these hybrids were fertile, although their fertility was reduced as compared to crosses within chromosomal races. We demonstrate that (i) F1 hybrids are highly heterozygous with nearly all chromosomes participating in the formation of trivalents at the first meiotic division, and (ii) that from F1 to F4 the number of trivalents decreases and the number of bivalents increases. We argue that the observed process of chromosome sorting would, if continued, result in a new homozygous chromosomal race, i.e., in a new karyotype with intermediate chromosome number and, possibly, in a new incipient homoploid hybrid species. We also discuss the segregational model of karyotype evolution and the chromosomal model of homoploid hybrid speciation.
Multiple Sclerosis (MS) is an autoimmune, neurological disease, commonly presenting with a relapsing-remitting form, that later converts to a secondary progressive stage, referred to as RRMS and SPMS, respectively. Early treatment slows disease progression, hence, accurate and early diagnosis is crucial. Recent advances in large-scale data processing and analysis have progressed molecular biomarker development. Here, we focus on small RNA data derived from cell-free cerebrospinal fluid (CSF), cerebrospinal fluid cells, plasma and peripheral blood mononuclear cells as well as CSF cell methylome data, from people with RRMS (n = 20), clinically/radiologically isolated syndrome (CIS/RIS, n = 2) and neurological disease controls (n = 14). We applied multiple co-inertia analysis (MCIA), an unsupervised and thereby unbiased, multivariate method for simultaneous data integration and found that the top latent variable classifies RRMS status with an Area Under the Receiver Operating Characteristics (AUROC) score of 0.82. Variable selection based on Lasso regression reduced features to 44, derived from the small RNAs from plasma (20), CSF cells (8) and cell-free CSF (16), with a marginal reduction in AUROC to 0.79. Samples from SPMS patients (n = 6) were subsequently projected on the latent space and differed significantly from RRMS and controls. On contrary, we found no differences between relapse and remission or between inflammatory and non-inflammatory disease controls, suggesting that the latent variable is not prone to inflammatory signals alone, but could be MS-specific. Hence, we here showcase that integration of small RNAs from plasma and CSF can be utilized to distinguish RRMS from SPMS and neurological disease controls.
In eukaryotic cells, gene expression is highly regulated at many layers. Nascent RNA molecules are assembled into ribonucleoprotein complexes that are then released into the nucleoplasmic milieu and transferred to the nuclear pore complex for nuclear export. RNAs are then either translated or transported to the cellular periphery. Emerging evidence indicates that RNA-binding proteins play an essential role throughout RNA biogenesis, from the gene to polyribosomes. However, the sorting mechanisms that regulate whether an RNA molecule is immediately translated or sent to specialized locations for translation are unclear. This question is highly relevant during development and differentiation when cells acquire a specific identity. Here, we focus on the RNA-binding properties of heterogeneous nuclear ribonucleoproteins (hnRNPs) and how these mechanisms are believed to play an essential role in RNA trafficking in polarized cells. Further, by focusing on the specific hnRNP protein CBF-A/hnRNPab and its naturally occurring isoforms, we propose a model on how hnRNP proteins are capable of regulating gene expression both spatially and temporally throughout the RNA biogenesis pathway, impacting both healthy and diseased cells.
Accurate inference of gene regulatory networks (GRNs) is important to unravel unknown regulatory mechanisms and processes, which can lead to the identification of treatment targets for genetic diseases. A variety of GRN inference methods have been proposed that, under suitable data conditions, perform well in benchmarks that consider the entire spectrum of false-positives and -negatives. However, it is very challenging to predict which single network sparsity gives the most accurate GRN. Lacking criteria for sparsity selection, a simplistic solution is to pick the GRN that has a certain number of links per gene, which is guessed to be reasonable. However, this does not guarantee finding the GRN that has the correct sparsity or is the most accurate one. In this study, we provide a general approach for identifying the most accurate and sparsity-wise relevant GRN within the entire space of possible GRNs. The algorithm, called SPA, applies a “GRN information criterion” (GRNIC) that is inspired by two commonly used model selection criteria, Akaike and Bayesian Information Criterion (AIC and BIC) but adapted to GRN inference. The results show that the approach can, in most cases, find the GRN whose sparsity is close to the true sparsity and close to as accurate as possible with the given GRN inference method and data. The datasets and source code can be found at https://bitbucket.org/sonnhammergrni/spa/.
The regulatory relationships between genes and proteins in a cell form a gene regulatory network (GRN) that controls the cellular response to changes in the environment. A number of inference methods to reverse engineer the original GRN from large-scale expression data have recently been developed. However, the absence of ground-truth GRNs when evaluating the performance makes realistic simulations of GRNs necessary. One aspect of this is that local network motif analysis of real GRNs indicates that the feed-forward loop (FFL) is significantly enriched. To simulate this properly, we developed a novel motif-based preferential attachment algorithm, FFLatt, which outperformed the popular GeneNetWeaver network generation tool in reproducing the FFL motif occurrence observed in literature-based biological GRNs. It also preserves important topological properties such as scale-free topology, sparsity, and average in/out-degree per node. We conclude that FFLatt is well-suited as a network generation module for a benchmarking framework with the aim to provide fair and robust performance evaluation of GRN inference methods.
Thermal tolerance range, based on temperatures that result in incapacitating effects, influences species' distributions and has been used to predict species' response to increasing temperature. Reproductive performance may also be negatively affected at less extreme temperatures, but such sublethal heat-induced sterility has been relatively ignored in studies addressing the potential effects of, and ability of species' to respond to, predicted climate warming. The few studies examining the link between increased temperature and reproductive performance typically focus on adults, although effects can vary between life history stages. Here we assessed how sublethal heat stress during development impacted subsequent adult fertility and its plasticity, both of which can provide the raw material for evolutionary responses to increased temperature. We quantified phenotypic and genetic variation in fertility ofDrosophila melanogasterreared at standardized densities in three temperatures (25, 27, and 29 degrees C) from a set of lines of theDrosophilaGenetic Reference Panel (DGRP). We found little phenotypic variation at the two lower temperatures with more variation at the highest temperature and for plasticity. Males were more affected than females. Despite reasonably large broad-sense heritabilities, a genome-wide association study found little evidence for additive genetic variance and no genetic variants were robustly linked with reproductive performance at specific temperatures or for phenotypic plasticity. We compared results on heat-induced male sterility with other DGRP results on relevant fitness traits measured after abiotic stress and found an association between male susceptibility to sterility and male lifespan reduction following oxidative stress. Our results suggest that sublethal stress during development has profound negative consequences on male adult reproduction, but despite phenotypic variation in a population for this response, there is limited evolutionary potential, either through adaptation to a specific developmental temperature or plasticity in response to developmental heat-induced sterility.