Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Designing efficient randstrobes for sequence similarity analyses
Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen. Stockholms universitet, Science for Life Laboratory (SciLifeLab).
Stockholms universitet, Science for Life Laboratory (SciLifeLab). Stockholms universitet, Naturvetenskapliga fakulteten, Matematiska institutionen.
Stockholm Univ, Dept Biochem & Biophys, Sci Life Lab, Natl Bioinformat Infrastruct Sweden, SE-17121 Solna, Sweden.
Visa övriga samt affilieringar
Antal upphovsmän: 102024 (Engelska)Ingår i: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 40, nr 4, artikel-id btae187Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Motivation: Substrings of length k, commonly referred to as k-mers, play a vital role in sequence analysis. However, k-mers are limited to exact matches between sequences leading to alternative constructs. We recently introduced a class of new constructs, strobemers, that can match across substitutions and smaller insertions and deletions. Randstrobes, the most sensitive strobemer proposed in Sahlin (Effective sequence similarity detection with strobemers. Genome Res 2021a;31:2080–94. https://doi.org/10.1101/gr.275648.121), has been used in several bioinformatics applications such as read classification, short-read mapping, and read overlap detection. Recently, we showed that the more pseudo-random the behavior of the construction (measured in entropy), the more efficient the seeds for sequence similarity analysis. The level of pseudo-randomness depends on the construction operators, but no study has investigated the efficacy.

Results: In this study, we introduce novel construction methods, including a Binary Search Tree-based approach that improves time complexity over previous methods. To our knowledge, we are also the first to address biases in construction and design three metrics for measuring bias. Our evaluation shows that our methods have favorable speed and sampling uniformity compared to existing approaches. Lastly, guided by our results, we change the seed construction in strobealign, a short-read mapper, and find that the results change substantially. We suggest combining the two results to improve strobealign’s accuracy for the shortest reads in our evaluated datasets. Our evaluation highlights sampling biases that can occur and provides guidance on which operators to use when implementing randstrobes.

Availability and implementation: All methods and evaluation benchmarks are available in a public Github repository at https://github.com/Moein-Karami/RandStrobes. The scripts for running the strobealign analysis are found at https://github.com/NBISweden/strobealign-evaluation.

Ort, förlag, år, upplaga, sidor
2024. Vol. 40, nr 4, artikel-id btae187
Nationell ämneskategori
Bioinformatik (beräkningsbiologi) Byggprocess och förvaltning
Identifikatorer
URN: urn:nbn:se:su:diva-229044DOI: 10.1093/bioinformatics/btae187ISI: 001206629000004PubMedID: 38579261Scopus ID: 2-s2.0-85191199242OAI: oai:DiVA.org:su-229044DiVA, id: diva2:1858955
Tillgänglig från: 2024-05-20 Skapad: 2024-05-20 Senast uppdaterad: 2026-03-12Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextPubMedScopus

Person

Sahlin, Kristoffer

Sök vidare i DiVA

Av författaren/redaktören
Shen, WeiXu, MengyangPatro, RobSahlin, Kristoffer
Av organisationen
Matematiska institutionenScience for Life Laboratory (SciLifeLab)
I samma tidskrift
Bioinformatics
Bioinformatik (beräkningsbiologi)Byggprocess och förvaltning

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetricpoäng

doi
pubmed
urn-nbn
Totalt: 75 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf