The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.
Policriti, Alberto; Gigante, Nicola; Prezza, Nicola. (2015). Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform. In Language and Automata Theory and Applications: 9th International Conference, LATA 2015, Nice, France, March 2-6, 2015, Proceedings (pp. 587- 598). Springer. Isbn: 978-3-319-15578-4. Doi: 10.1007/978-3-319-15579-1_46. http://link.springer.com/chapter/10.1007/978-3-319-15579-1_46.
Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform
PREZZA, Nicola
2015
Abstract
The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.| File | Dimensione | Formato | |
|---|---|---|---|
|
lata2015.pdf
Solo gestori archivio
Tipologia:
Documento in Pre-print
Licenza:
DRM (Digital rights management) non definiti
Dimensione
320.96 kB
Formato
Adobe PDF
|
320.96 kB | Adobe PDF | Visualizza/Apri |
Pubblicazioni consigliate
I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.



