The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform / POLICRITI, Alberto; GIGANTE, Nicola; PREZZA, Nicola. - (2015), pp. 587-598. ((Intervento presentato al convegno 9th International Conference, LATA 2015 tenutosi a Nice, France nel March 2-6, 2015 [10.1007/978-3-319-15579-1_46].

Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform

PREZZA, Nicola
2015

Abstract

The Burrows-Wheeler Transform is a text permutation that has revolutionized the fields of pattern matching and text compression, bridging the gap existing between the two. In this paper we approach the BWT-construction problem generalizing a well-known algorithm—based on backward search and dynamic strings manipulation—to work in a context-wise fashion, using automata on words. Let n , σ , and Hk be the text length, the alphabet size, and the k -th order empirical entropy of the text, respectively. Moreover, let H∗k=minHk+1,⌈logσ⌉ . Under the word RAM model with word size w∈Θ(logn) , our algorithm builds the BWT in average O(nH∗k) time using nH∗k+o(nH∗k) bits of space, where k=log_σ(n/log2n)−1 . We experimentally show that our algorithm has very good performances (essentially linear time) on DNA sequences, using about 2.6 bits per input symbol in RAM.
978-3-319-15578-4
BWT; compressed space; dynamic bitvector
Average Linear Time and Compressed Space Construction of the Burrows-Wheeler Transform / POLICRITI, Alberto; GIGANTE, Nicola; PREZZA, Nicola. - (2015), pp. 587-598. ((Intervento presentato al convegno 9th International Conference, LATA 2015 tenutosi a Nice, France nel March 2-6, 2015 [10.1007/978-3-319-15579-1_46].
File in questo prodotto:
File Dimensione Formato  
lata2015.pdf

Solo gestori archivio

Tipologia: Documento in Pre-print
Licenza: DRM non definito
Dimensione 320.96 kB
Formato Adobe PDF
320.96 kB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

Caricamento pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11385/194113
Citazioni
  • Scopus 5
  • ???jsp.display-item.citation.isi??? ND
social impact