Change search
ReferencesLink to record
Permanent link

Direct link
Improved gap size estimation for scaffolding algorithms
Stockholm University, Faculty of Science, Numerical Analysis and Computer Science (NADA).
2012 (English)In: Bioinformatics, ISSN 1367-4803, E-ISSN 1460-2059, Vol. 28, no 17, 2215-2222 p.Article in journal (Refereed) Published
Abstract [en]

Motivation: One of the important steps of genome assembly is scaffolding, in which contigs are linked using information from read-pairs. Scaffolding provides estimates about the order, relative orientation and distance between contigs. We have found that contig distance estimates are generally strongly biased and based on false assumptions. Since erroneous distance estimates can mislead in subsequent analysis, it is important to provide unbiased estimation of contig distance.

Results: In this article, we show that state-of-the-art programs for scaffolding are using an incorrect model of gap size estimation. We discuss why current maximum likelihood estimators are biased and describe what different cases of bias we are facing. Furthermore, we provide a model for the distribution of reads that span a gap and derive the maximum likelihood equation for the gap length. We motivate why this estimate is sound and show empirically that it outperforms gap estimators in popular scaffolding programs. Our results have consequences both for scaffolding software, structural variation detection and for library insert-size estimation as is commonly performed by read aligners.

Place, publisher, year, edition, pages
2012. Vol. 28, no 17, 2215-2222 p.
National Category
Bioinformatics and Systems Biology
Research subject
Computer Science
URN: urn:nbn:se:su:diva-79067DOI: 10.1093/bioinformatics/bts441ISI: 000308019200001OAI: diva2:546929
Swedish Research Council, 2010-4634
Available from: 2012-12-20 Created: 2012-08-25 Last updated: 2012-12-20Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Arvestad, Lars
By organisation
Numerical Analysis and Computer Science (NADA)
In the same journal
Bioinformatics and Systems Biology

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 20 hits
ReferencesLink to record
Permanent link

Direct link