This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work Claude et al., (2016). In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package.

On the reproducibility of experiments of indexing repetitive document collections / Fariña, Antonio; Martínez-Prieto, Miguel A.; Claude, Francisco; Navarro, Gonzalo; Lastra-Díaz, Juan J.; Prezza, Nicola; Seco, Diego. - In: INFORMATION SYSTEMS. - ISSN 0306-4379. - 83:(2019), pp. 181-194. [10.1016/j.is.2019.03.007]

On the reproducibility of experiments of indexing repetitive document collections

Prezza, Nicola;
2019

Abstract

This work introduces a companion reproducible paper with the aim of allowing the exact replication of the methods, experiments, and results discussed in a previous work Claude et al., (2016). In that parent paper, we proposed many and varied techniques for compressing indexes which exploit that highly repetitive collections are formed mostly of documents that are near-copies of others. More concretely, we describe a replication framework, called uiHRDC (universal indexes for Highly Repetitive Document Collections), that allows our original experimental setup to be easily replicated using various document collections. The corresponding experimentation is carefully explained, providing precise details about the parameters that can be tuned for each indexing solution. Finally, note that we also provide uiHRDC as reproducibility package.
2019
Repetitive document collections, Inverted index, Self-index, Reproducibility
On the reproducibility of experiments of indexing repetitive document collections / Fariña, Antonio; Martínez-Prieto, Miguel A.; Claude, Francisco; Navarro, Gonzalo; Lastra-Díaz, Juan J.; Prezza, Nicola; Seco, Diego. - In: INFORMATION SYSTEMS. - ISSN 0306-4379. - 83:(2019), pp. 181-194. [10.1016/j.is.2019.03.007]
File in questo prodotto:
File Dimensione Formato  
Cappa_On the reproducibility of experiments.pdf

Solo gestori archivio

Tipologia: Versione dell'editore
Licenza: DRM (Digital rights management) non definiti
Dimensione 8.88 MB
Formato Adobe PDF
8.88 MB Adobe PDF   Visualizza/Apri
Pubblicazioni consigliate

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11385/192326
Citazioni
  • Scopus 3
  • ???jsp.display-item.citation.isi??? 2
  • OpenAlex ND
social impact