Change search
Link to record
Permanent link

Direct link
Vezzi, Francesco
Publications (10 of 11) Show all publications
Ameur, A., Che, H., Martin, M., Bunikis, I., Dahlberg, J., Höijer, I., . . . Gyllensten, U. (2018). De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data. Genes, 9(10), Article ID 486.
Open this publication in new window or tab >>De Novo Assembly of Two Swedish Genomes Reveals Missing Segments from the Human GRCh38 Reference and Improves Variant Calling of Population-Scale Sequencing Data
Show others...
2018 (English)In: Genes, E-ISSN 2073-4425, Vol. 9, no 10, article id 486Article in journal (Refereed) Published
Abstract [en]

The current human reference sequence (GRCh38) is a foundation for large-scale sequencing projects. However, recent studies have suggested that GRCh38 may be incomplete and give a suboptimal representation of specific population groups. Here, we performed a de novo assembly of two Swedish genomes that revealed over 10 Mb of sequences absent from the human GRCh38 reference in each individual. Around 6 Mb of these novel sequences (NS) are shared with a Chinese personal genome. The NS are highly repetitive, have an elevated GC-content, and are primarily located in centromeric or telomeric regions. Up to 1 Mb of NS can be assigned to chromosome Y, and large segments are also missing from GRCh38 at chromosomes 14, 17, and 21. Inclusion of NS into the GRCh38 reference radically improves the alignment and variant calling from short-read whole-genome sequencing data at several genomic loci. A re-analysis of a Swedish population-scale sequencing project yields > 75,000 putative novel single nucleotide variants (SNVs) and removes > 10,000 false positive SNV calls per individual, some of which are located in protein coding regions. Our results highlight that the GRCh38 reference is not yet complete and demonstrate that personal genome assemblies from local populations can improve the analysis of short-read whole-genome sequencing data.

Keywords
de novo assembly, SMRT sequencing, GRCh38, human reference genome, human whole-genome sequencing, population sequencing, Swedish population
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-162937 (URN)10.3390/genes9100486 (DOI)000448656700024 ()30304863 (PubMedID)
Available from: 2018-12-17 Created: 2018-12-17 Last updated: 2024-07-04Bibliographically approved
Anh, N., Taylan, F., Zachariadis, V., Ivanov Öfverholm, I., Lindstrand, A., Vezzi, F., . . . Barbany, G. (2018). High-resolution detection of chromosomal rearrangements in leukemias through mate pair whole genome sequencing. PLOS ONE, 13(3), Article ID e0193928.
Open this publication in new window or tab >>High-resolution detection of chromosomal rearrangements in leukemias through mate pair whole genome sequencing
Show others...
2018 (English)In: PLOS ONE, E-ISSN 1932-6203, Vol. 13, no 3, article id e0193928Article in journal (Refereed) Published
Abstract [en]

The detection of recurrent somatic chromosomal rearrangements is standard of care for most leukemia types. Even though karyotype analysis-a low-resolution genome-wide chromosome analysis-is still the gold standard, it often needs to be complemented with other methods to increase resolution. To evaluate the feasibility and applicability of mate pair whole genome sequencing (MP-WGS) to detect structural chromosomal rearrangements in the diagnostic setting, we sequenced ten bone marrow samples from leukemia patients with recurrent rearrangements. Samples were selected based on cytogenetic and FISH results at leukemia diagnosis to include common rearrangements of prognostic relevance. Using MP-WGS and in-house bioinformatic analysis all sought rearrangements were successfully detected. In addition, unexpected complexity or additional, previously undetected rearrangements was unraveled in three samples. Finally, the MP-WGS analysis pinpointed the location of chromosome junctions at high resolution and we were able to identify the exact exons involved in the resulting fusion genes in all samples and the specific junction at the nucleotide level in half of the samples. The results show that our approach combines the screening character from karyotype analysis with the specificity and resolution of cytogenetic and molecular methods. As a result of the straightforward analysis and high-resolution detection of clinically relevant rearrangements, we conclude that MP-WGS is a feasible method for routine leukemia diagnostics of structural chromosomal rearrangements.

National Category
Biological Sciences Medical Genetics
Identifiers
urn:nbn:se:su:diva-156109 (URN)10.1371/journal.pone.0193928 (DOI)000427189300034 ()29529047 (PubMedID)
Available from: 2018-05-11 Created: 2018-05-11 Last updated: 2022-03-23Bibliographically approved
Ameur, A., Dahlberg, J., Olason, P., Vezzi, F., Karlsson, R., Martin, M., . . . Gyllensten, U. (2017). SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population. European Journal of Human Genetics, 25(11), 1253-1260
Open this publication in new window or tab >>SweGen: a whole-genome data resource of genetic variability in a cross-section of the Swedish population
Show others...
2017 (English)In: European Journal of Human Genetics, ISSN 1018-4813, E-ISSN 1476-5438, Vol. 25, no 11, p. 1253-1260Article in journal (Refereed) Published
Abstract [en]

Here we describe the SweGen data set, a comprehensive map of genetic variation in the Swedish population. These data represent a basic resource for clinical genetics laboratories as well as for sequencing-based association studies by providing information on genetic variant frequencies in a cohort that is well matched to national patient cohorts. To select samples for this study, we first examined the genetic structure of the Swedish population using high-density SNP-array data from a nation-wide cohort of over 10 000 Swedish-born individuals included in the Swedish Twin Registry. A total of 1000 individuals, reflecting a cross-section of the population and capturing the main genetic structure, were selected for whole-genome sequencing. Analysis pipelines were developed for automated alignment, variant calling and quality control of the sequencing data. This resulted in a genome-wide collection of aggregated variant frequencies in the Swedish population that we have made available to the scientific community through the website https://swefreq.nbis.se. A total of 29.2 million single-nucleotide variants and 3.8 million indels were detected in the 1000 samples, with 9.9 million of these variants not present in current databases. Each sample contributed with an average of 7199 individual-specific variants. In addition, an average of 8645 larger structural variants (SVs) were detected per individual, and we demonstrate that the population frequencies of these SVs can be used for efficient filtering analyses. Finally, our results show that the genetic diversity within Sweden is substantial compared with the diversity among continental European populations, underscoring the relevance of establishing a local reference data set.

National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-148972 (URN)10.1038/ejhg.2017.130 (DOI)000412823800012 ()28832569 (PubMedID)
Available from: 2017-12-05 Created: 2017-12-05 Last updated: 2022-03-23Bibliographically approved
Engström, K., Wojdacz, T. K., Marabita, F., Ewels, P., Kaller, M., Vezzi, F., . . . Broberg, K. (2017). Transcriptomics and methylomics of CD4-positive T cells in arsenic-exposed women. Archives of Toxicology, 91(5), 2067-2078
Open this publication in new window or tab >>Transcriptomics and methylomics of CD4-positive T cells in arsenic-exposed women
Show others...
2017 (English)In: Archives of Toxicology, ISSN 0340-5761, E-ISSN 1432-0738, Vol. 91, no 5, p. 2067-2078Article in journal (Refereed) Published
Abstract [en]

Arsenic, a carcinogen with immunotoxic effects, is a common contaminant of drinking water and certain food worldwide. We hypothesized that chronic arsenic exposure alters gene expression, potentially by altering DNA methylation of genes encoding central components of the immune system. We therefore analyzed the transcriptomes (by RNA sequencing) and methylomes (by target-enrichment next-generation sequencing) of primary CD4-positive T cells from matched groups of four women each in the Argentinean Andes, with fivefold differences in urinary arsenic concentrations (median concentrations of urinary arsenic in the lower- and high-arsenic groups: 65 and 276 mu g/l, respectively). Arsenic exposure was associated with genome-wide alterations of gene expression; principal component analysis indicated that the exposure explained 53% of the variance in gene expression among the top variable genes and 19% of 28,351 genes were differentially expressed (false discovery rate < 0.05) between the exposure groups. Key genes regulating the immune system, such as tumor necrosis factor alpha and interferon gamma, as well as genes related to the NF-kappa-beta complex, were significantly downregulated in the high-arsenic group. Arsenic exposure was associated with genome-wide DNA methylation; the high-arsenic group had 3% points higher genome-wide full methylation (> 80% methylation) than the lower-arsenic group. Differentially methylated regions that were hyper-methylated in the high-arsenic group showed enrichment for immune-related gene ontologies that constitute the basic functions of CD4-positive T cells, such as isotype switching and lymphocyte activation and differentiation. In conclusion, chronic arsenic exposure from drinking water was related to changes in the transcriptome and methylome of CD4-positive T cells, both genome wide and in specific genes, supporting the hypothesis that arsenic causes immunotoxicity by interfering with gene expression and regulation.

Keywords
Arsenic, Transcriptomics, Methylomics, CD4 cells, Immune system, Immunotoxic
National Category
Pharmacology and Toxicology Biological Sciences
Identifiers
urn:nbn:se:su:diva-143389 (URN)10.1007/s00204-016-1879-4 (DOI)000399875300003 ()27838757 (PubMedID)
Available from: 2017-06-01 Created: 2017-06-01 Last updated: 2022-03-23Bibliographically approved
Nilsson, D., Pettersson, M., Gustavsson, P., Förster, A., Hofmeister, W., Wincent, J., . . . Lindstrand, A. (2017). Whole-Genome Sequencing of Cytogenetically Balanced Chromosome Translocations Identifies Potentially Pathological Gene Disruptions and Highlights the Importance of Microhomology in the Mechanism of Formation. Human Mutation, 38(2), 180-192
Open this publication in new window or tab >>Whole-Genome Sequencing of Cytogenetically Balanced Chromosome Translocations Identifies Potentially Pathological Gene Disruptions and Highlights the Importance of Microhomology in the Mechanism of Formation
Show others...
2017 (English)In: Human Mutation, ISSN 1059-7794, E-ISSN 1098-1004, Vol. 38, no 2, p. 180-192Article in journal (Refereed) Published
Abstract [en]

Most balanced translocations are thought to result mechanistically from nonhomologous end joining or, in rare cases of recurrent events, by nonallelic homologous recombination. Here, we use low-coverage mate pair whole-genome sequencing to fine map rearrangement breakpoint junctions in both phenotypically normal and affected translocation carriers. In total, 46 junctions from 22 carriers of balanced translocations were characterized. Genes were disrupted in 48% of the breakpoints; recessive genes in four normal carriers and known dominant intellectual disability genes in three affected carriers. Finally, seven candidate disease genes were disrupted in five carriers with neurocognitive disabilities (SVOPL, SUSD1, TOX, NCALD, SLC4A10) and one XX-male carrier with Tourette syndrome (LYPD6, GPC5). Breakpoint junction analyses revealed microhomology and small templated insertions in a substantive fraction of the analyzed translocations (17.4%; n = 4); an observation that was substantiated by reanalysis of 37 previously published translocation junctions. Microhomology associated with templated insertions is a characteristic seen in the breakpoint junctions of rearrangements mediated by error-prone replication-based repair mechanisms. Our data implicate that a mechanism involving template switching might contribute to the formation of at least 15% of the interchromosomal translocation events.

Keywords
balanced chromosomal aberration, reciprocal translocation, whole-genome sequencing, microhomology, nonhomologous end joining, replication-based repair mechanisms
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-141372 (URN)10.1002/humu.23146 (DOI)000393687800007 ()27862604 (PubMedID)
Available from: 2017-04-28 Created: 2017-04-28 Last updated: 2022-02-28Bibliographically approved
Spjuth, O., Bongcam-Rudloff, E., Dahlberg, J., Dahlö, M., Kallio, A., Pireddu, L., . . . Korpelainen, E. (2016). Recommendations on e-infrastructures for next-generation sequencing. GigaScience, 5, Article ID 26.
Open this publication in new window or tab >>Recommendations on e-infrastructures for next-generation sequencing
Show others...
2016 (English)In: GigaScience, E-ISSN 2047-217X, Vol. 5, article id 26Article, review/survey (Refereed) Published
Abstract [en]

With ever-increasing amounts of data being produced by next-generation sequencing (NGS) experiments, the requirements placed on supporting e-infrastructures have grown. In this work, we provide recommendations based on the collective experiences from participants in the EU COST Action SeqAhead for the tasks of data preprocessing, upstream processing, data delivery, and downstream analysis, as well as long-term storage and archiving. We cover demands on computational and storage resources, networks, software stacks, automation of analysis, education, and also discuss emerging trends in the field. E-infrastructures for NGS require substantial effort to set up and maintain over time, and with sequencing technologies and best practices for data analysis evolving rapidly it is important to prioritize both processing capacity and e-infrastructure flexibility when making strategic decisions to support the data analysis demands of tomorrow. Due to increasingly demanding technical requirements we recommend that e-infrastructure development and maintenance be handled by a professional service unit, be it internal or external to the organization, and emphasis should be placed on collaboration between researchers and IT professionals.

Keywords
E-infrastructure, Next-generation sequencing, High-performance computing, Cloud computing
National Category
Biological Sciences
Identifiers
urn:nbn:se:su:diva-131918 (URN)10.1186/s13742-016-0132-7 (DOI)000377153700001 ()27267963 (PubMedID)
Available from: 2016-07-06 Created: 2016-07-04 Last updated: 2023-02-06Bibliographically approved
Grunewald, J., Kaiser, Y., Ostadkarampour, M., Rivera, N. V., Vezzi, F., Lötstedt, B., . . . Eklund, A. (2016). T-cell receptor-HLA-DRB1 associations suggest specific antigens in pulmonary sarcoidosis. European Respiratory Journal, 47(3), 898-909
Open this publication in new window or tab >>T-cell receptor-HLA-DRB1 associations suggest specific antigens in pulmonary sarcoidosis
Show others...
2016 (English)In: European Respiratory Journal, ISSN 0903-1936, E-ISSN 1399-3003, Vol. 47, no 3, p. 898-909Article in journal (Refereed) Published
Abstract [en]

In pulmonary sarcoidosis, CD4(+) T-cells expressing T-cell receptor V alpha 2.3 accumulate in the lungs of HLA-DRB1*03(+) patients. To investigate T-cell receptor-HLA-DRB1*03 interactions underlying recognition of hitherto unknown antigens, we performed detailed analyses of T-cell receptor expression on bronchoalveolar lavage fluid CD4(+) T-cells from sarcoidosis patients. Pulmonary sarcoidosis patients (n=43) underwent bronchoscopy with bronchoalveolar lavage. T-cell receptor alpha and beta chains of CD4(+) T-cells were analysed by flow cytometry, DNA-sequenced, and three-dimensional molecular models of T-cell receptor-HLA-DRB1*03 complexes generated. Simultaneous expression of V alpha 2.3 with the V beta 22 chain was identified in the lungs of all HLA-DRB1*03(+) patients. Accumulated V alpha 2.3/V beta 22-expressing T-cells were highly clonal, with identical or near-identical V alpha 2.3 chain sequences and inter-patient similarities in V beta 22 chain amino acid distribution. Molecular modelling revealed specific T-cell receptor-HLA-DRB1*03-peptide interactions, with a previously identified, sarcoidosis-associated vimentin peptide, (Vim)(429-443) DSLPLVDTHSKRTLL, matching both the HLA peptide-binding cleft and distinct T-cell receptor features perfectly. We demonstrate, for the first time, the accumulation of large clonal populations of specific V alpha 2.3/V beta 22 T-cell receptor-expressing CD4(+) T-cells in the lungs of HLA-DRB1*03(+) sarcoidosis patients. Several distinct contact points between V alpha 2.3/V beta 22 receptors and HLA-DRB1*03 molecules suggest presentation of prototypic vimentin-derived peptides.

National Category
Biological Sciences Immunology in the medical area
Identifiers
urn:nbn:se:su:diva-136225 (URN)10.1183/13993003.01209-2015 (DOI)000385950700026 ()26585430 (PubMedID)
Available from: 2016-12-15 Created: 2016-12-01 Last updated: 2022-02-28Bibliographically approved
Hofmeister, W., Nilsson, D., Topa, A., Anderlid, B.-M., Darki, F., Matsson, H., . . . Lindstrand, A. (2015). CTNND2-a candidate gene for reading problems and mild intellectual disability. Journal of Medical Genetics, 52(2), 111-122
Open this publication in new window or tab >>CTNND2-a candidate gene for reading problems and mild intellectual disability
Show others...
2015 (English)In: Journal of Medical Genetics, ISSN 0022-2593, Vol. 52, no 2, p. 111-122Article in journal (Refereed) Published
Abstract [en]

Background Cytogenetically visible chromosomal translocations are highly informative as they can pinpoint strong effect genes even in complex genetic disorders. Methods and results Here, we report a mother and daughter, both with borderline intelligence and learning problems within the dyslexia spectrum, and two apparently balanced reciprocal translocations: t(1;8)(p22; q24) and t(5; 18)(p15; q11). By low coverage mate-pair whole-genome sequencing, we were able to pinpoint the genomic breakpoints to 2 kb intervals. By direct sequencing, we then located the chromosome 5p breakpoint to intron 9 of CTNND2. An additional case with a 163 kb microdeletion exclusively involving CTNND2 was identified with genome-wide array comparative genomic hybridisation. This microdeletion at 5p15.2 is also present in mosaic state in the patient's mother but absent from the healthy siblings. We then investigated the effect of CTNND2 polymorphisms on normal variability and identified a polymorphism (rs2561622) with significant effect on phonological ability and white matter volume in the left frontal lobe, close to cortical regions previously associated with phonological processing. Finally, given the potential role of CTNND2 in neuron motility, we used morpholino knockdown in zebrafish embryos to assess its effects on neuronal migration in vivo. Analysis of the zebrafish forebrain revealed a subpopulation of neurons misplaced between the diencephalon and telencephalon. Conclusions Taken together, our human genetic and in vivo data suggest that defective migration of subpopulations of neuronal cells due to haploinsufficiency of CTNND2 contribute to the cognitive dysfunction in our patients.

National Category
Medical Genetics
Identifiers
urn:nbn:se:su:diva-114229 (URN)10.1136/jmedgenet-2014-102757 (DOI)000348203900006 ()25473103 (PubMedID)
Note

AuthorCount:15;

Available from: 2015-03-20 Created: 2015-02-25 Last updated: 2022-02-23Bibliographically approved
Olsen, R.-A., Bunikis, I., Tiukova, I., Holmberg, K., Lötstedt, B., Pettersson, O. V., . . . Vezzi, F. (2015). De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping. GigaScience, 4, Article ID 56.
Open this publication in new window or tab >>De novo assembly of Dekkera bruxellensis: a multi technology approach using short and long-read sequencing and optical mapping
Show others...
2015 (English)In: GigaScience, E-ISSN 2047-217X, Vol. 4, article id 56Article in journal (Refereed) Published
Abstract [en]

Background: It remains a challenge to perform de novo assembly using next-generation sequencing (NGS). Despite the availability of multiple sequencing technologies and tools (e.g., assemblers) it is still difficult to assemble new genomes at chromosome resolution (i.e., one sequence per chromosome). Obtaining high quality draft assemblies is extremely important in the case of yeast genomes to better characterise major events in their evolutionary history. The aim of this work is two-fold: on the one hand we want to show how combining different and somewhat complementary technologies is key to improving assembly quality and correctness, and on the other hand we present a de novo assembly pipeline we believe to be beneficial to core facility bioinformaticians. To demonstrate both the effectiveness of combining technologies and the simplicity of the pipeline, here we present the results obtained using the Dekkera bruxellensis genome. Methods: In this work we used short-read Illumina data and long-read PacBio data combined with the extreme long-range information from OpGen optical maps in the task of de novo genome assembly and finishing. Moreover, we developed NouGAT, a semi-automated pipeline for read-preprocessing, de novo assembly and assembly evaluation, which was instrumental for this work. Results: We obtained a high quality draft assembly of a yeast genome, resolved on a chromosomal level. Furthermore, this assembly was corrected for mis-assembly errors as demonstrated by resolving a large collapsed repeat and by receiving higher scores by assembly evaluation tools. With the inclusion of PacBio data we were able to fill about 5 % of the optical mapped genome not covered by the Illumina data.

National Category
Bioinformatics and Systems Biology
Identifiers
urn:nbn:se:su:diva-124734 (URN)10.1186/s13742-015-0094-1 (DOI)000365669400002 ()26617983 (PubMedID)2-s2.0-85006210402 (Scopus ID)
Available from: 2016-01-12 Created: 2016-01-04 Last updated: 2023-02-06Bibliographically approved
Alexeyenko, A., Nystedt, B., Vezzi, F., Sherwood, E., Ye, R., Knudsen, B., . . . Lundeberg, J. (2014). Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools. BMC Genomics, 15, 439
Open this publication in new window or tab >>Efficient de novo assembly of large and complex genomes by massively parallel sequencing of Fosmid pools
Show others...
2014 (English)In: BMC Genomics, E-ISSN 1471-2164, Vol. 15, p. 439-Article in journal (Refereed) Published
Abstract [en]

Background: Sampling genomes with Fosmid vectors and sequencing of pooled Fosmid libraries on the Illumina platform for massive parallel sequencing is a novel and promising approach to optimizing the trade-off between sequencing costs and assembly quality. Results: In order to sequence the genome of Norway spruce, which is of great size and complexity, we developed and applied a new technology based on the massive production, sequencing, and assembly of Fosmid pools (FP). The spruce chromosomes were sampled with similar to 40,000 bp Fosmid inserts to obtain around two-fold genome coverage, in parallel with traditional whole genome shotgun sequencing (WGS) of haploid and diploid genomes. Compared to the WGS results, the contiguity and quality of the FP assemblies were high, and they allowed us to fill WGS gaps resulting from repeats, low coverage, and allelic differences. The FP contig sets were further merged with WGS data using a novel software package GAM-NGS. Conclusions: By exploiting FP technology, the first published assembly of a conifer genome was sequenced entirely with massively parallel sequencing. Here we provide a comprehensive report on the different features of the approach and the optimization of the process. We have made public the input data (FASTQ format) for the set of pools used in this study: ftp://congenie.org/congenie/Nystedt_2013/Assembly/ProcessedData/FosmidPools/.(alternatively accessible via http://congenie.org/downloads).The software used for running the assembly process is available at http://research.scilifelab.se/andrej_alexeyenko/downloads/fpools/.

National Category
Biological Sciences Chemical Sciences
Identifiers
urn:nbn:se:su:diva-106343 (URN)10.1186/1471-2164-15-439 (DOI)000338258700001 ()
Note

AuthorCount:11;

Available from: 2014-08-08 Created: 2014-08-04 Last updated: 2024-01-17Bibliographically approved
Organisations

Search in DiVA

Show all publications