The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.

Policriti, Alberto; Gigante, Nicola; Prezza, Nicola. (2015). Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform. In Language and Automata Theory and Applications: 9th International Conference, LATA 2015, Nice, France, March 2-6, 2015, Proceedings (pp. 587- 598). Springer. Isbn: 978-3-319-15578-4. Doi: 10.1007/978-3-319-15579-1_46. http://link.springer.com/chapter/10.1007/978-3-319-15579-1_46.

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform

PREZZA, Nicola
2015

Abstract

The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.
2015
978-3-319-15578-4
BWT; compressed space; dynamic bitvector
Policriti, Alberto; Gigante, Nicola; Prezza, Nicola. (2015). Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform. In Language and Automata Theory and Applications: 9th International Conference, LATA 2015, Nice, France, March 2-6, 2015, Proceedings (pp. 587- 598). Springer. Isbn: 978-3-319-15578-4. Doi: 10.1007/978-3-319-15579-1_46. http://link.springer.com/chapter/10.1007/978-3-319-15579-1_46.
File in questo prodotto:
File Dimensione Formato  
lata2015.pdf

Solo gestori archivio

Tipologia: Documento in Pre-print
Licenza: DRM (Digital rights management) non definiti
Dimensione 320.96 kB
Formato Adobe PDF
320.96 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/194113
Citazioni
  • Scopus 6
  • ???jsp.display-item.citation.isi??? ND
  • OpenAlex ND
social impact