Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Entropy predicts sensitivity of pseudorandom seeds
Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen.ORCID-id: 0000-0001-8442-0536
Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen.ORCID-id: 0000-0001-7378-2320
2023 (Engelska)Ingår i: Genome Research, ISSN 1088-9051, E-ISSN 1549-5469, Vol. 33, nr 7, s. 1162-1174Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Seed design is important for sequence similarity search applications such as read mapping and average nucleotide identity (ANI) estimation. Although k-mers and spaced k-mers are likely the most well-known and used seeds, sensitivity suffers at high error rates, particularly when indels are present. Recently, we developed a pseudorandom seeding construct, strobemers, which was empirically shown to have high sensitivity also at high indel rates. However, the study lacked a deeper understanding of why. In this study, we propose a model to estimate the entropy of a seed and find that seeds with high entropy, according to our model, in most cases have high match sensitivity. Our discovered seed randomness–sensitivity relationship explains why some seeds perform better than others, and the relationship provides a framework for designing even more sensitive seeds. We also present three new strobemer seed constructs: mixedstrobes, altstrobes, and multistrobes. We use both simulated and biological data to show that our new seed constructs improve sequence-matching sensitivity to other strobemers. We show that the three new seed constructs are useful for read mapping and ANI estimation. For read mapping, we implement strobemers into minimap2 and observe 30% faster alignment time and 0.2% higher accuracy than using k-mers when mapping reads at high error rates. As for ANI estimation, we find that higher entropy seeds have a higher rank correlation between estimated and true ANI.

Ort, förlag, år, upplaga, sidor
2023. Vol. 33, nr 7, s. 1162-1174
Nationell ämneskategori
Bioinformatik (beräkningsbiologi)
Identifikatorer
URN: urn:nbn:se:su:diva-225341DOI: 10.1101/gr.277645.123PubMedID: 37217253Scopus ID: 2-s2.0-85168804709OAI: oai:DiVA.org:su-225341DiVA, id: diva2:1827718
Forskningsfinansiär
Vetenskapsrådet, 2018-05973VetenskapsrådetVetenskapsrådet, 2021-04000Tillgänglig från: 2024-01-15 Skapad: 2024-01-15 Senast uppdaterad: 2024-02-09Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Maier, Benjamin DominikSahlin, Kristoffer

Sök vidare i DiVA

Av författaren/redaktören
Maier, Benjamin DominikSahlin, Kristoffer
Av organisationen
Matematiska institutionen
I samma tidskrift
Genome Research
Bioinformatik (beräkningsbiologi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 10 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf