Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Efficient mapping of accurate long reads in minimizer space with mapquik
Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen. Stockholms universitet, Science for Life Laboratory (SciLifeLab).ORCID-id: 0000-0001-7378-2320
Vise andre og tillknytning
2023 (engelsk)Inngår i: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 33, nr 7, s. 1188-1197Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

DNA sequencing data continue to progress toward longer reads with increasingly lower sequencing error rates. We focus on the critical problem of mapping, or aligning, low-divergence sequences from long reads (e.g., Pacific Biosciences [PacBio] HiFi) to a reference genome, which poses challenges in terms of accuracy and computational resources when using cutting-edge read mapping approaches that are designed for all types of alignments. A natural idea would be to optimize efficiency with longer seeds to reduce the probability of extraneous matches; however, contiguous exact seeds quickly reach a sensitivity limit. We introduce mapquik, a novel strategy that creates accurate longer seeds by anchoring alignments through matches of k consecutively sampled minimizers (k-min-mers) and only indexing k-min-mers that occur once in the reference genome, thereby unlocking ultrafast mapping while retaining high sensitivity. We show that mapquik significantly accelerates the seeding and chaining steps-fundamental bottlenecks to read mapping-for both the human and maize genomes with >96% sensitivity and near-perfect specificity. On the human genome, for both real and simulated reads, mapquik achieves a 37x speedup over the state-of-the-art tool minimap2, and on the maize genome, mapquik achieves a 410x speedup over minimap2, making mapquik the fastest mapper to date. These accelerations are enabled from not only minimizer-space seeding but also a novel heuristic O(n) pseudochaining algorithm, which improves upon the long-standing O(nlogn) bound. Minimizer-space computation builds the foundation for achieving real-time analysis of long-read sequencing data.

sted, utgiver, år, opplag, sider
2023. Vol. 33, nr 7, s. 1188-1197
HSV kategori
Identifikatorer
URN: urn:nbn:se:su:diva-221725DOI: 10.1101/gr.277679.123ISI: 001059942600001PubMedID: 37399256Scopus ID: 2-s2.0-85167896090OAI: oai:DiVA.org:su-221725DiVA, id: diva2:1800912
Tilgjengelig fra: 2023-09-28 Laget: 2023-09-28 Sist oppdatert: 2023-09-28bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstPubMedScopus

Person

Sahlin, Kristoffer

Søk i DiVA

Av forfatter/redaktør
Ekim, BarisSahlin, Kristoffer
Av organisasjonen
I samme tidsskrift
Genome Research

Søk utenfor DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 13 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf