In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (EBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the EBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in the EBWT of the reads collection, and we develop a tool finding SNPs with a simple scan of the EBWT and LCP arrays. Preliminary results show that our method requires much less coverage than state-of-the-art tools while drastically improving precision and sensitivity.

Detecting Mutations by eBWT / Prezza, Nicola; Pisanti, Nadia; Sciortino, Marinella; Rosone, Giovanna. - Proceedings of 18th Conference on Algorithms in Bioinformatics (WABI), 2018, (2018), pp. 1-15. (18th Conference on Algorithms in Bioinformatics (WABI 2018), Helsinki, Finland, August 20-24, 2018). [10.4230/LIPIcs.WABI.2018.3].

Detecting Mutations by eBWT

Nicola Prezza;
2018

Abstract

In this paper we develop a theory describing how the extended Burrows-Wheeler Transform (EBWT) of a collection of DNA fragments tends to cluster together the copies of nucleotides sequenced from a genome G. Our theory accurately predicts how many copies of any nucleotide are expected inside each such cluster, and how an elegant and precise LCP array based procedure can locate these clusters in the EBWT. Our findings are very general and can be applied to a wide range of different problems. In this paper, we consider the case of alignment-free and reference-free SNPs discovery in multiple collections of reads. We note that, in accordance with our theoretical results, SNPs are clustered in the EBWT of the reads collection, and we develop a tool finding SNPs with a simple scan of the EBWT and LCP arrays. Preliminary results show that our method requires much less coverage than state-of-the-art tools while drastically improving precision and sensitivity.
2018
978-3-95977-082-8
BWT, LCP Array, SNPs, Reference-free, Assembly-free
File in questo prodotto:
File Dimensione Formato  
Wabi2018-ebwt.pdf

Open Access

Tipologia: Versione dell'editore
Licenza: Creative commons
Dimensione 554.16 kB
Formato Adobe PDF
554.16 kB Adobe PDF Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/194107
Citazioni
  • Scopus 10
  • ???jsp.display-item.citation.isi??? ND
social impact