The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.
Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform / Policriti, Alberto; Gigante, Nicola; Prezza, Nicola. - Language and Automata Theory and Applications: 9th International Conference, LATA 2015, Nice, France, March 2-6, 2015, Proceedings, (2015), pp. 587-598. (9th International Conference, LATA 2015, Nice, France, March 2-6, 2015). [10.1007/978-3-319-15579-1_46].
Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform
PREZZA, Nicola
2015
Abstract
The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.File | Dimensione | Formato | |
---|---|---|---|
lata2015.pdf
Solo gestori archivio
Tipologia:
Documento in Pre-print
Licenza:
DRM (Digital rights management) non definiti
Dimensione
320.96 kB
Formato
Adobe PDF
|
320.96 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.