IRIS - Institutional Research Information System

We consider the problem of indexing a text T (of length n) with a light data structure that supports efficient search of patterns P (of length m) allowing errors under the Hamming distance. We propose a hash-based strategy that employs two classes of hash functions—dubbed Hamming-aware and de Bruijn—to drastically reduce search space and memory footprint of the index, respectively. We use our succinct hash data structure to solve the k-mismatch search problem in 2n log σ + o(n log σ) bits of space with a random- ized algorithm having smoothed complexity O((2σ)k(log n)k(log m+ξ)+ (occ + 1) · m), where σ is the alphabet size, occ is the number of occur- rences, and ξ is a term depending on m, n, and on the amplitude ε of the noise perturbing text and pattern. Significantly, we obtain that for any ε > 0, for m large enough, ξ ∈ O(log m): our results improve upon previous linear-space solutions of the k-mismatch problem.

Hashing and Indexing: Succinct DataStructures and Smoothed Analysis / Policriti, Alberto; Prezza, Nicola. - Algorithms and Computation: 25th International Symposium, ISAAC 2014, Jeonju, Korea, December 15-17, 2014, Proceedings, (2014), pp. 157-168. (25th International Symposium, ISAAC 2014, Jeonju, Corea, December 15-17, 2014). [10.1007/978-3-319-13075-0_13].

Hashing and Indexing: Succinct DataStructures and Smoothed Analysis

POLICRITI, Alberto;PREZZA, Nicola

2014

Abstract

We consider the problem of indexing a text T (of length n) with a light data structure that supports efficient search of patterns P (of length m) allowing errors under the Hamming distance. We propose a hash-based strategy that employs two classes of hash functions—dubbed Hamming-aware and de Bruijn—to drastically reduce search space and memory footprint of the index, respectively. We use our succinct hash data structure to solve the k-mismatch search problem in 2n log σ + o(n log σ) bits of space with a random- ized algorithm having smoothed complexity O((2σ)k(log n)k(log m+ξ)+ (occ + 1) · m), where σ is the alphabet size, occ is the number of occur- rences, and ξ is a term depending on m, n, and on the amplitude ε of the noise perturbing text and pattern. Significantly, we obtain that for any ε > 0, for m large enough, ξ ∈ O(log m): our results improve upon previous linear-space solutions of the k-mismatch problem.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno del convegno
	
				2014
			
	Codice ISBN
	
				9783319130743
			
	Parole chiave
	
				Hash Function, Hash Table, Query Time ,Alphabet Size, Reduce Search Space
			
	Appare nelle tipologie:
	
				04.1 - Contributo in Atti di convegno (Paper in Proceedings)

File in questo prodotto:

File	Dimensione	Formato
chp:10.1007/978-3-319-13075-0_13.pdf Solo gestori archivio Tipologia: Versione dell'editore Licenza: DRM (Digital rights management) non definiti Dimensione 226.59 kB Formato Adobe PDF Visualizza/Apri	226.59 kB	Adobe PDF	Visualizza/Apri

Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/194111

Citazioni

6

4

ND

social impact