Change search
Link to record
Permanent link

Direct link
Publications (10 of 13) Show all publications
Höjer, P., Frick, T., Siga, H., Pourbozorgi, P., Aghelpasand, H., Martin, M. & Ahmadian, A. (2023). BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies. Nucleic Acids Research, 51(22), Article ID e114.
Open this publication in new window or tab >>BLR: a flexible pipeline for haplotype analysis of multiple linked-read technologies
Show others...
2023 (English)In: Nucleic Acids Research, ISSN 0305-1048, E-ISSN 1362-4962, Vol. 51, no 22, article id e114Article in journal (Refereed) Published
Abstract [en]

Linked-read sequencing promises a one-method approach for genome-wide insights including single nucleotide variants (SNVs), structural variants, and haplotyping. We introduce Barcode Linked Reads (BLR), an open-source haplotyping pipeline capable of handling millions of barcodes and data from multiple linked-read technologies including DBS, 10× Genomics, TELL-seq and stLFR. Running BLR on DBS linked-reads yielded megabase-scale phasing with low (<0.2%) switch error rates. Of 13616 protein-coding genes phased in the GIAB benchmark set (v4.2.1), 98.6% matched the BLR phasing. In addition, large structural variants showed concordance with HPRC-HG002 reference assembly calls. Compared to diploid assembly with PacBio HiFi reads, BLR phasing was more continuous when considering switch errors. We further show that integrating long reads at low coverage (∼10×) can improve phasing contiguity and reduce switch errors in tandem repeats. When compared to Long Ranger on 10× Genomics data, BLR showed an increase in phase block N50 with low switch-error rates. For TELL-Seq and stLFR linked reads, BLR generated longer or similar phase block lengths and low switch error rates compared to results presented in the original publications. In conclusion, BLR provides a flexible workflow for comprehensive haplotype analysis of linked reads from multiple platforms.

National Category
Bioinformatics and Systems Biology Genetics
Identifiers
urn:nbn:se:su:diva-225099 (URN)10.1093/nar/gkad1010 (DOI)001101836300001 ()37941142 (PubMedID)2-s2.0-85180312128 (Scopus ID)
Available from: 2024-01-08 Created: 2024-01-08 Last updated: 2024-01-11Bibliographically approved
Ratz, M., von Berlin, L., Larsson, L., Martin, M., Orzechowski Westholm, J., La Manno, G., . . . Frisén, J. (2022). Clonal relations in the mouse brain revealed by single-cell and spatial transcriptomics. Nature Neuroscience, 25(3), 285-294
Open this publication in new window or tab >>Clonal relations in the mouse brain revealed by single-cell and spatial transcriptomics
Show others...
2022 (English)In: Nature Neuroscience, ISSN 1097-6256, E-ISSN 1546-1726, Vol. 25, no 3, p. 285-294Article in journal (Refereed) Published
Abstract [en]

Ratz et al. present an easy-to-use method to barcode progenitor cells, enabling profiling of cell phenotypes and clonal relations using single-cell and spatial transcriptomics, providing an integrated approach for understanding brain architecture. The mammalian brain contains many specialized cells that develop from a thin sheet of neuroepithelial progenitor cells. Single-cell transcriptomics revealed hundreds of molecularly diverse cell types in the nervous system, but the lineage relationships between mature cell types and progenitor cells are not well understood. Here we show in vivo barcoding of early progenitors to simultaneously profile cell phenotypes and clonal relations in the mouse brain using single-cell and spatial transcriptomics. By reconstructing thousands of clones, we discovered fate-restricted progenitor cells in the mouse hippocampal neuroepithelium and show that microglia are derived from few primitive myeloid precursors that massively expand to generate widely dispersed progeny. We combined spatial transcriptomics with clonal barcoding and disentangled migration patterns of clonally related cells in densely labeled tissue sections. Our approach enables high-throughput dense reconstruction of cell phenotypes and clonal relations at the single-cell and tissue level in individual animals and provides an integrated approach for understanding tissue architecture.

National Category
Cell and Molecular Biology
Identifiers
urn:nbn:se:su:diva-203213 (URN)10.1038/s41593-022-01011-x (DOI)000761885700001 ()35210624 (PubMedID)2-s2.0-85125392438 (Scopus ID)
Available from: 2022-03-28 Created: 2022-03-28 Last updated: 2022-03-28Bibliographically approved
Phad, G. E., Pushparaj, P., Tran, K., Dubrovskaya, V., Adori, M., Martinez-Murillo, P., . . . Hedestam, G. B. K. (2020). Extensive dissemination and intraclonal maturation of HIV Env vaccine-induced B cell responses. Journal of Experimental Medicine, 217(2)
Open this publication in new window or tab >>Extensive dissemination and intraclonal maturation of HIV Env vaccine-induced B cell responses
Show others...
2020 (English)In: Journal of Experimental Medicine, ISSN 0022-1007, E-ISSN 1540-9538, Vol. 217, no 2Article in journal (Refereed) Published
Abstract [en]

Well-ordered HIV-1 envelope glycoprotein (Env) trimers are prioritized for clinical evaluation, and there is a need for an improved understanding about how elicited B cell responses evolve following immunization. To accomplish this, we prime-boosted rhesus macaques with Glade C NFL trimers and identified 180 unique Ab lineages from similar to 1,000 single-sorted Envspecific memory B cells. We traced all lineages in high-throughput heavy chain (HC) repertoire (Rep-seq) data generated from multiple immune compartments and time points and expressed several as monoclonal Abs (mAbs). Our results revealed broad dissemination and high levels of somatic hypermutation (SHM) of most lineages, including tier 2 virus neutralizing lineages, following boosting. SHM was highest in the Ab complementarity determining regions (CDRs) but also surprisingly high in the framework regions (FRO, especially FR3. Our results demonstrate the capacity of the immune system to affinity-mature large numbers of Env-specific B cell lineages simultaneously, supporting the use of regimens consisting of repeated boosts to improve each Ab, even those belonging to less expanded lineages.

National Category
Microbiology in the medical area Biological Sciences
Identifiers
urn:nbn:se:su:diva-181099 (URN)10.1084/jem.20191155 (DOI)000523657100016 ()31704807 (PubMedID)
Available from: 2020-04-29 Created: 2020-04-29 Last updated: 2022-03-23Bibliographically approved
Lindstrand, A., Eisfeldt, J., Pettersson, M., Carvalho, C. M. B., Kvarnung, M., Grigelioniene, G., . . . Nilsson, D. (2019). From cytogenetics to cytogenomics: whole-genome sequencing as a first-line test comprehensively captures the diverse spectrum of disease-causing genetic variation underlying intellectual disability. Genome Medicine, 11(1), Article ID 68.
Open this publication in new window or tab >>From cytogenetics to cytogenomics: whole-genome sequencing as a first-line test comprehensively captures the diverse spectrum of disease-causing genetic variation underlying intellectual disability
Show others...
2019 (English)In: Genome Medicine, E-ISSN 1756-994X, Vol. 11, no 1, article id 68Article in journal (Refereed) Published
Abstract [en]

Background: Since different types of genetic variants, from single nucleotide variants (SNVs) to large chromosomal rearrangements, underlie intellectual disability, we evaluated the use of whole-genome sequencing (WGS) rather than chromosomal microarray analysis (CMA) as a first-line genetic diagnostic test.

Methods: We analyzed three cohorts with short-read WGS: (i) a retrospective cohort with validated copy number variants (CNVs) (cohort 1, n=68), (ii) individuals referred for monogenic multi-gene panels (cohort 2, n=156), and (iii) 100 prospective, consecutive cases referred to our center for CMA (cohort 3). Bioinformatic tools developed include FindSV, SVDB, Rhocall, Rhoviz, and vcf2cytosure.

Results: First, we validated our structural variant (SV)-calling pipeline on cohort 1, consisting of three trisomies and 79 deletions and duplications with a median size of 850kb (min 500bp, max 155Mb). All variants were detected. Second, we utilized the same pipeline in cohort 2 and analyzed with monogenic WGS panels, increasing the diagnostic yield to 8%. Next, cohort 3 was analyzed by both CMA and WGS. The WGS data was processed for large (>10kb) SVs genome-wide and for exonic SVs and SNVs in a panel of 887 genes linked to intellectual disability as well as genes matched to patient-specific Human Phenotype Ontology (HPO) phenotypes. This yielded a total of 25 pathogenic variants (SNVs or SVs), of which 12 were detected by CMA as well. We also applied short tandem repeat (STR) expansion detection and discovered one pathologic expansion in ATXN7. Finally, a case of Prader-Willi syndrome with uniparental disomy (UPD) was validated in the WGS data. Important positional information was obtained in all cohorts. Remarkably, 7% of the analyzed cases harbored complex structural variants, as exemplified by a ring chromosome and two duplications found to be an insertional translocation and part of a cryptic unbalanced translocation, respectively.

Conclusion: The overall diagnostic rate of 27% was more than doubled compared to clinical microarray (12%). Using WGS, we detected a wide range of SVs with high accuracy. Since the WGS data also allowed for analysis of SNVs, UPD, and STRs, it represents a powerful comprehensive genetic test in a clinical diagnostic laboratory setting.

Keywords
Whole-genome sequencing, Intellectual disability, Monogenic disease, Copy number variation, Structural variation, Single nucleotide variant, Uniparental disomy, Repeat expansion
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-176549 (URN)10.1186/s13073-019-0675-1 (DOI)000495667900001 ()31694722 (PubMedID)2-s2.0-85074626429 (Scopus ID)
Available from: 2019-12-18 Created: 2019-12-18 Last updated: 2024-07-04Bibliographically approved
Bernet, N. V., Corcoran, M., Hardt, U., Kaduk, M., Phad, G. E., Martin, M. & Hedestam, G. B. K. (2019). High-Quality Library Preparation for NGS-Based Immunoglobulin Germline Gene Inference and Repertoire Expression Analysis. Frontiers in Immunology, 10, Article ID 660.
Open this publication in new window or tab >>High-Quality Library Preparation for NGS-Based Immunoglobulin Germline Gene Inference and Repertoire Expression Analysis
Show others...
2019 (English)In: Frontiers in Immunology, E-ISSN 1664-3224, Vol. 10, article id 660Article in journal (Refereed) Published
Abstract [en]

Next generation sequencing (NGS) of immunoglobulin (Ig) repertoires (Rep-seq) enables examination of the adaptive immune system at an unprecedented level. Applications include studies of expressed repertoires, gene usage, somatic hypermutation levels, Ig lineage tracing and identification of genetic variation within the Ig loci through inference methods. All these applications require starting libraries that allow the generation of sequence data with low error rate and optimal representation of the expressed repertoire. Here, we provide detailed protocols for the production of libraries suitable for human Ig germline gene inference and Ig repertoire studies. Various parameters used in the process were tested in order to demonstrate factors that are critical to obtain high quality libraries. We demonstrate an improved 5'RACE technique that reduces the length constraints of Illumina MiSeq based Rep-seq analysis but allows for the acquisition of sequences upstream of Ig V genes, useful for primer design. We then describe a 5' multiplex method for library preparation, which yields full length V(D)J sequences suitable for genotype identification and novel gene inference. We provide comprehensive sets of primers targeting IGHV, IGKV, and IGLV genes. Using the optimized protocol, we produced IgM, IgG, IgK, and IgL libraries and analyzed them using the germline inference tool IgDiscover to identify expressed germline V alleles. This process additionally uncovered three IGHV, one IGKV, and six IGLV novel alleles in a single individual, which are absent from the IMGT reference database, highlighting the need for further study of Ig genetic variation. The library generation protocols presented here enable a robust means of analyzing expressed Ig repertoires, identifying novel alleles and producing individualized germline gene databases from humans.

Keywords
next generation sequencing, immunoglobulin, antibody, repertoire, library, germline gene, inference, database
National Category
Biological Sciences Microbiology in the medical area
Identifiers
urn:nbn:se:su:diva-168620 (URN)10.3389/fimmu.2019.00660 (DOI)000463564400001 ()
Available from: 2019-05-10 Created: 2019-05-10 Last updated: 2024-01-17Bibliographically approved
Marschall, T., Marz, M., Abeel, T., Dijkstra, L., Dutilh, B. E., Ghaffaari, A., . . . Schonhuth, A. (2018). Computational pan-genomics: status, promises and challenges. Briefings in Bioinformatics, 19(1), 118-135
Open this publication in new window or tab >>Computational pan-genomics: status, promises and challenges
Show others...
2018 (English)In: Briefings in Bioinformatics, ISSN 1467-5463, E-ISSN 1477-4054, Vol. 19, no 1, p. 118-135Article in journal (Refereed) Published
Abstract [en]

Many disciplines, from human genetics and oncology to plant breeding, microbiology and virology, commonly face the challenge of analyzing rapidly increasing numbers of genomes. In case of Homo sapiens, the number of sequenced genomes will approach hundreds of thousands in the next few years. Simply scaling up established bioinformatics pipelines will not be sufficient for leveraging the full potential of such rich genomic data sets. Instead, novel, qualitatively different computational methods and paradigms are needed. We will witness the rapid extension of computational pan-genomics, a new sub-area of research in computational biology. In this article, we generalize existing definitions and understand a pan-genome as any collection of genomic sequences to be analyzed jointly or to be used as a reference. We examine already available approaches to construct and use pan-genomes, discuss the potential benefits of future technologies and methodologies and review open challenges from the vantage point of the above-mentioned biological disciplines. As a prominent example for a computational paradigm shift, we particularly highlight the transition from the representation of reference genomes as strings to representations as graphs. We outline how this and other challenges from different application domains translate into common computational problems, point out relevant bioinformatics techniques and identify open problems in computer science. With this review, we aim to increase awareness that a joint approach to computational pan-genomics can help address many of the problems currently faced in various domains.

Keywords
pan-genome, sequence graph, read mapping, haplotypes, data structures
National Category
Bioinformatics (Computational Biology) Biological Sciences
Identifiers
urn:nbn:se:su:diva-153888 (URN)10.1093/bib/bbw089 (DOI)000423311000011 ()27769991 (PubMedID)
Available from: 2018-03-07 Created: 2018-03-07 Last updated: 2022-03-23Bibliographically approved
Ameur, A., Che, H., Martin, M., Bunikis, I., Dahlberg, J., Höijer, I., . . . Gyllensten, U. (2018). De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data. Genes, 9(10), Article ID 486.
Open this publication in new window or tab >>De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data
Show others...
2018 (English)In: Genes, E-ISSN 2073-4425, Vol. 9, no 10, article id 486Article in journal (Refereed) Published
Abstract [en]

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.

Keywords
de novo assembly, SMRT sequencing, GRCh38, human reference genome, human whole-genome sequencing, population sequencing, Swedish population
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-162937 (URN)10.3390/genes9100486 (DOI)000448656700024 ()30304863 (PubMedID)
Available from: 2018-12-17 Created: 2018-12-17 Last updated: 2024-07-04Bibliographically approved
Didion, J. P., Martin, M. & Collins, F. S. (2017). Atropos: specific, sensitive, and speedy trimming of sequencing reads. PeerJ, 5, Article ID e3720.
Open this publication in new window or tab >>Atropos: specific, sensitive, and speedy trimming of sequencing reads
2017 (English)In: PeerJ, E-ISSN 2167-8359, Vol. 5, article id e3720Article in journal (Refereed) Published
Abstract [en]

A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leadingedge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github. com/jdidion/atropos.

Keywords
NGS, Sequencing, Read, Trimming, Preprocessing, Adapter, Cutadapt, Illumina
National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:su:diva-148906 (URN)10.7717/peerj.3720 (DOI)000411955500001 ()28875074 (PubMedID)2-s2.0-85028608023 (Scopus ID)
Available from: 2017-11-15 Created: 2017-11-15 Last updated: 2023-08-28Bibliographically approved
Ameur, A., Dahlberg, J., Olason, P., Vezzi, F., Karlsson, R., Martin, M., . . . Gyllensten, U. (2017). SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population. European Journal of Human Genetics, 25(11), 1253-1260
Open this publication in new window or tab >>SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
Show others...
2017 (English)In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 25, no 11, p. 1253-1260Article in journal (Refereed) Published
Abstract [en]

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-148972 (URN)10.1038/ejhg.2017.130 (DOI)000412823800012 ()28832569 (PubMedID)
Available from: 2017-12-05 Created: 2017-12-05 Last updated: 2022-03-23Bibliographically approved
Reischauer, S., Stone, O. A., Villasenor, A., Chi, N., Jin, S.-W., Martin, M., . . . Stainier, D. Y. R. (2016). Cloche is a bHLH-PAS transcription factor that drives haemato-vascular specification. Nature, 535(7611), 294-298
Open this publication in new window or tab >>Cloche is a bHLH-PAS transcription factor that drives haemato-vascular specification
Show others...
2016 (English)In: Nature, ISSN 0028-0836, E-ISSN 1476-4687, Vol. 535, no 7611, p. 294-298Article in journal (Refereed) Published
Abstract [en]

Vascular and haematopoietic cells organize into specialized tissues during early embryogenesis to supply essential nutrients to all organs and thus play critical roles in development and disease. At the top of the haemato-vascular specification cascade lies cloche, a gene that when mutated in zebrafish leads to the striking phenotype of loss of most endothelial and haematopoietic cells(1-4) and a significant increase in cardiomyocyte numbers(5). Although this mutant has been analysed extensively to investigate mesoderm diversification and differentiation(1-7) and continues to be broadly used as a unique avascular model, the isolation of the cloche gene has been challenging due to its telomeric location. Here we used a deletion allele of cloche to identify several new cloche candidate genes within this genomic region, and systematically genome-edited each candidate. Through this comprehensive interrogation, we succeeded in isolating the cloche gene and discovered that it encodes a PAS-domain-containing bHLH transcription factor, and that it is expressed in a highly specific spatiotemporal pattern starting during late gastrulation. Gain-of-function experiments show that it can potently induce endothelial gene expression. Epistasis experiments reveal that it functions upstream of etv2 and tal1, the earliest expressed endothelial and haematopoietic transcription factor genes identified to date. A mammalian cloche orthologue can also rescue blood vessel formation in zebrafish cloche mutants, indicating a highly conserved role in vertebrate vasculogenesis and haematopoiesis. The identification of this master regulator of endothelial and haematopoietic fate enhances our understanding of early mesoderm diversification and may lead to improved protocols for the generation of endothelial and haematopoietic cells in vivo and in vitro.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-132941 (URN)10.1038/nature18614 (DOI)000379912600059 ()27411634 (PubMedID)
Available from: 2016-09-01 Created: 2016-08-26 Last updated: 2022-02-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-0680-200x

Search in DiVA

Show all publications