The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform / Policriti, Alberto; Gigante, Nicola; Prezza, Nicola. - Language and Automata Theory and Applications: 9th International Conference, LATA 2015, Nice, France, March 2-6, 2015, Proceedings, (2015), pp. 587-598. (9th International Conference, LATA 2015, Nice, France, March 2-6, 2015). [10.1007/978-3-319-15579-1_46].

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform

PREZZA, Nicola
2015

Abstract

The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.
2015
978-3-319-15578-4
BWT; compressed space; dynamic bitvector
File in questo prodotto:
File Dimensione Formato  
lata2015.pdf

Solo gestori archivio

Tipologia: Documento in Pre-print
Licenza: DRM (Digital rights management) non definiti
Dimensione 320.96 kB
Formato Adobe PDF
320.96 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/194113
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact